Definition of large-scale language models and their relationship to generative AI

ChatGPT, which appeared in November 2022, made mainstream the concept that businesses and consumers could leverage generative AI to automate tasks, support creative ideas, and even code software.

OpenAI’s ChatGPT or Google Bard can be used to briefly summarize emails or ongoing conversations. AI can also help when you need to spruce up your resume with more fluent language and impressive key points. If you want ideas for a new marketing or advertising campaign, generative AI will come to your rescue.

ChatGPT is an abbreviation of ‘chatbot-generated pre-learning converter’. The basis of the chatbot is the GPT Large-Scale Language Model (LLM). LLM, a type of computer algorithm, processes natural language input and predicts the next word based on what has already been said. After that, it predicts the next word, and then predicts the next word, and finally completes the answer.

In its simplest terms, an LLM is an engine that predicts the next word.

ⓒ Getty Images Bank

Along with OpenAI’s GPT-3 and 4 LLMs, popular LLMs include Google’s LaMDA and PaLM LLM (based on Bard), Hugging Face’s BLOOM and XLM-RoBERTa, Nvidia’s NeMO LLM, XLNet, Co: There are open models such as here, GLM-130B, etc.

In particular, open-source LLMs are gaining popularity because they allow more customizable models to be developed at a lower cost. With the launch of Meta’s LLaMA (Large-scale Language Model Meta AI) in February, development activities based on open-source LLM have exploded.

An LLM is a type of AI that is currently learning from masses of articles, Wikipedia entries, books, Internet-based resources, and other inputs to answer natural language questions like a human. That’s a really huge amount of data. However, because vendors are looking to tailor LLMs for specific uses, and those uses don’t require the massive data sets used by today’s most popular models, LLMs will shrink rather than grow in size.

For example, according to one report, Google’s new PaLM 2 LLM, announced earlier this year, uses 3.6 trillion tokens (word strings) as training data, nearly five times as many as its predecessor from just a year ago. This increased data set allows the PaLM 2 to perform more advanced coding, math, and creative writing tasks.

So what exactly is an LLM?

ⓒ Getty Images Bank

An LLM is a machine learning neural network trained through a set of inputs and outputs of data. Texts are unclassified and models often use self-supervised or quasi-supervised learning methodologies. When information or content is fed into the LLM, the next word predicted by the algorithm is output. The input can be assetized corporate data or, in the case of ChatGPT, all data scraped directly from the Internet and provided.

To train the LLM to use the right data, you need to use a huge, expensive server farm that acts as a supercomputer.

LLMs are controlled by millions, billions and even trillions of parameters. (Think of the parameters as helping the LLM choose between different answers.) OpenAI’s GPT-3 LLM has 175 billion parameters, and its latest model, the GPT-4, is said to have 1 trillion parameters.

For example, in the LLM prompt window, “What I ate for lunch today… ” and LLM will suggest the words “cereal” or “rice” or “steak tartare”. There is no 100% correct answer, but there are probabilities based on the data already entered into the model. The LLM can complete the sentence with the most probable answer, “serial,” based on existing data. However, since LLM is a probabilistic engine, it specifies a percentage for each possible answer. Cereal has a 50% chance, “rice” has a 20% chance of being the answer, and steak tartare has a 0.005% chance of being the answer.

“The fact that LLM learns for this task is key,” said Yun Kim, an MIT assistant professor who studies machine learning, natural language processing, and deep learning. different from humans The probabilities are specified through a sufficiently large training set.”

However, one thing to keep in mind is that if you put garbage in, garbage will come out. In other words, if the information received by the LLM is biased, incomplete, and undesirable, the LLM’s response may likewise be unreliable, bizarre, and even objectionable. Data analysts describe it as a hallucination, as responses can be so far off the beaten track.

Jonathan Siddharth, CEO of Turing, an American company that uses AI to recruit, hire, and train software engineers remotely, said, “The reason hallucinations occur is that LLM, in its most basic form, internally represents the world. because it doesn’t There is no concept of truth. LLM makes a statistical estimate and predicts the next word based on what it has seen so far.”

Some LLMs can even self-learn from Internet-based data, which can go well beyond their initial development goals. Microsoft’s Bing, for example, uses GPT-3 as a base, but also queries the search engine and analyzes the first 20 or so results. Present responses using both the LLM and the Internet.

“Some models learn one programming language and then automatically generate code in another programming language they have never seen before,” says Siddharth. Even natural language is the same. You can generate sentences in French without learning French.”

Siddharth said, “It is as if there is an action that has emerged suddenly. We don’t know how these neural networks work. It’s scary and exciting at the same time.”
Another problem with LLMs and their parameters is that they can be unintentionally biased by LLM developers and the collection of self-map data on the Internet.

Does LLM bias exist?

ⓒ Getty Images Bank

Systems like ChatGPT, for example, were very likely to give gender-biased answers based on data fed in via the Internet and programmers, according to Sabashi Kapoor, a PhD student at Princeton University’s Center for Information Technology Policy.

“We tested ChatGPT for an implicit bias in which gender was not explicitly mentioned and only information about pronouns was included,” Kapoor said. If she replaces her word “her” in a sentence with her “he,” Chat GPT cuts her chance of making a mistake by one-third,” she said.

Source: ITWorld Korea by

*The article has been translated based on the content of ITWorld Korea by If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!