Microsoft’s claim isn’t about the so-called “hallucinations,” which inexorably puts out wrong text. Besides, GPT is bad at games like chess and Go, it’s bad at math, and it probably produces code that’s a jumble of errors and bugs. That doesn’t mean that LLM/GPT are all over-the-top. Not at all. However, it means that in the discussion surrounding generative AI, we should have a certain sense of balance and drastically remove the exaggerated packaging.
According to an article in IEEE Spectrum Several experts, including OpenAI co-founder and chief scientist Ilya Sutskever, believe that the LLM hallucination can be eliminated by adding reinforcement learning with human feedback. But other experts, including Yann LeCun, chief scientist at Meta’s AI Lab, and Jeffrey Hinton, father of deep learning who recently left Google, argue that current large-scale language models are fundamentally flawed. These two see large-scale language models lacking the nonverbal knowledge necessary to understand the reality that language describes.
Diffblue CEO Matthew Lodge told InfoWorld that “small, fast, cheap-to-run reinforcement learning models easily outperform LLMs with hundreds of billions of parameters for everything from gaming to writing code.” .
If so, isn’t it looking for gold in the wrong place?
one game?As Lodge says, we may be pushing generative AI into an area where reinforcement learning can do much better. Games are a prime example. Posted by Levy Roseman, Master of Chess International Video of playing chess with ChatGPTIf you look at , ChatGPT makes outrageous moves, such as catching its own horse, and even commits foul play. In addition, Stockfish, an open-source chess software, does not use neural networks at all, and ChatGPT surrendered in 10 moves. It’s a good example of how LLM falls far short of hype.
Google AlphaGo is based on reinforcement learning. Reinforcement learning creates and tries multiple solutions to a problem, and uses the results to improve the next proposal. And repeat this process thousands of times to find the best result. In AlphaGo, the AI tries various moves and predicts whether the move is a good move, whether the position is likely to win, and so on. Use this feedback to follow a winning sequence and generate other possible moves. This process is called probabilistic search. This method is very effective for gameplay. AlphaGo has defeated several Go players in the past. AlphaGo isn’t perfect either, but it outperforms the best current LLMs.
Probability vs. AccuracyGiven evidence that LLMs are significantly inferior to other types of AI, proponents say that LLMs will “become better in the future.” However, Lodge notes, “For this claim to be true, we need to understand why LLMs can do this kind of work better, but that’s difficult.” No one can predict what GPT-4 will do for a given prompt. This model cannot be explained by humans. That’s why prompt engineering is pointless, says Lodge. He also pointed out that it is difficult for AI researchers to prove that the emergent properties of LLMs exist, and that they are even more difficult to predict.
Perhaps the best objection is induction. Because GPT-4 is larger than GPT-3, it excels at some language tasks. So wouldn’t a larger model be even better? Is it really so? According to Lodge, “The problem is that GPT-4 is struggling where GPT-3 struggled.” One of them is mathematics. GPT-4 is slightly better at adding than GPT-3, but it is still bad at multiplication and other mathematical operations.
Making the language model bigger doesn’t magically solve this endemic problem. And even OpenAI has said that bigger models aren’t the answer. The reason is OpenAI ForumIt is a fundamental characteristic of LLM that has been mentioned in . “Large-scale language models are probabilistic in nature and behave in such a way that they produce highly probable outputs based on the patterns they observe in the training data. Mathematics and physics problems usually have only one correct answer, and the probability of generating this one answer can be very low.”
On the other hand, AI based on reinforcement learning is much better at producing accurate results because it is a goal-seeking AI process. Reinforcement learning repeatedly operates toward a desired goal and produces the best answer closest to the goal. “The LLM, on the other hand, is not designed to iterate or pursue a goal,” says Lodge. It’s designed to give you a ‘good enough’ one-shot or a few-shot answer.”
A ‘one-shot’ answer is the first answer the model generates by predicting a series of words in a prompt. In the ‘few-shot’ approach, additional samples or hints are provided to help the model make better predictions. LLMs can also give different answers to the same question because they accept some degree of randomness to increase the likelihood of a better response.
The LLM camp is not ignoring reinforcement learning. GPT-4 embraces “reinforcement learning with human feedback (RLHF)”. That is, the core model is trained by human operators to prefer some answers to others, but does not fundamentally change the answers the model generates from scratch. Lodge said, for example, that an LLM could generate the following answer to complete the sentence “Wayne Gretzky likes Ice (OO).”
1. Wayne Gretzky loves ice cream.
2. Wayne Gretzky likes ice hockey.
3. Wayne Gretzky likes ice fishing.
4. Wayne Gretzky likes ice skating.
5. Wayne Gretzky likes ice wine.
Here, the human operator can rank the answers thinking that Wayne Gretzky is more likely to like ice hockey (or ice skating) because he is a legendary Canadian ice hockey player. Rankings of human operators and more human-generated responses are used to train this model. One thing to note is that GPT-4 doesn’t pretend to know exactly Wayne Gretzky’s preferences, it just gives you the most likely answer to complete a given prompt. After all, LLMs are not designed to be highly accurate or consistent. Lodge noted that all of this means that reinforcement learning will outperform generative AI when it comes to applying AI at scale.
Applying Reinforcement Learning to SoftwareWhat about software development? Many developers are experiencing productivity gains when using generative AI-powered tools such as Github’s Copilot and Amazon’s CodeWhisperer. These tools predict what code is likely to come next based on the code before and after the code insertion point in the integrated development environment.
actually David Ramel of Visual Studio Magazine He said the latest version of CoPilot already generates 61% of Java code. For those who fear that the profession of software developer is going away, these tools require “human oversight” to check the finished code and edit it so it compiles and runs properly. In fact, autocompletion has been a representative feature of IDEs since the early days of IDEs, and code generators including CoPilot have greatly increased the usefulness of that feature. But not the massively autonomous coding required to write 61% of Java code.
Lodge said reinforcement learning can accurately perform large-scale unsupervised coding. Of course, there is a reason why Lodge said this. In 2019, DeepBlue released ‘Cover’, a commercial reinforcement learning-based unit test writing tool. Cover allows you to automate complex and error-prone tasks at scale by writing entire unit tests without human intervention.
Given these facts, could it be said that Lodge’s argument is biased? Of course it is. But Lodge also has a wealth of experience to support the claim that reinforcement learning can outperform generative AI in software development. Currently, DeepBlue uses reinforcement learning to explore all possible test methods, automatically writes test code for each method, and selects the most appropriate test from these written tests. It takes an average of one second for the tool to generate tests for each method.
If your goal is to automate the writing of 10,000 unit tests for a program no one understands, Lodge says reinforcement learning is the only realistic solution. “LLM is no match. “At this scale, there’s no way for humans to effectively supervise and modify the code, and making the model bigger and more complex doesn’t solve the problem.”
The conclusion is this. LLM’s most powerful advantage is that it handles plain language. It can also perform language tasks that have not been explicitly learned. In other words, it is useful for many tasks including content creation (copywriting). “But that doesn’t mean LLMs can replace AI models based on reinforcement learning,” says Lodge. Reinforcement learning is more accurate, more consistent, and works at scale.”
Source: ITWorld Korea by www.itworld.co.kr.
*The article has been translated based on the content of ITWorld Korea by www.itworld.co.kr. If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!
*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.
*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!