Global Column | Data for AI/ML training, how much is enough?

The limit and future of artificial intelligence (AI) is ultimately people. Whether you expect or fear the emergence of human-like robots, the fact remains that in the end the problem lies with humans. In AI and data science, the ideal solution is to combine the strengths of humans and machines. For a while, proponents of the AI ​​industry have tended to focus on the machine side of the equation. But according to data scientist Elena Dyakkova at Spring Health, the data and the machines behind it are only useful ‘as long as humans understand’ they are useful. Let’s take this topic further.
ⓒ Getty Images Bank

Incomplete data and rational decision making

Let’s take a look at the conversation between Dyakkova and Sarah Catanjaro, general partner at Amplify Partners. “If you look at data experts, they often miss the value of reports and analytics that are initially sloppy but are getting better and better. Many decisions don’t require very accurate insights. There’s no need to be ashamed of a little bit of data,” says Catanzaro. said.

His point of reminding us that we don’t need accurate data to make decisions makes sense. Gary Marcus, founder of Geometric Intelligence, a machine learning company acquired by Uber in 2016, said a key factor in evaluating AI, machine learning, and deep learning is whether pattern recognition tools work well when rough results are needed. said whether Low cost and perfect results are not the key factors in determining success. Despite these points, we are working hard to acquire more and more data to create more powerful AI applications. If you provide enough data, you are expecting a machine learning model to perform better than “rough results.”

Unfortunately, the reality is not so simple. In many applications more data can be helpful, but in reality more data is not needed. Instead, someone who can better understand the data we already have is much more helpful. In response, Dyakkova said, “Eighty percent of product analysis is done quickly, even if it is somewhat lax. But the ability to decide when to perform such an analysis requires a very deep understanding of statistics.” Vincent Dowling, data scientist at, puts this more clearly. “The biggest advantage of having a seasoned analyst/scientist is that you can determine the accuracy of the data you need to make decisions,” he said.

What they have in common is how they make decisions. And both cases demonstrate that the experience of practitioners looking at the data is more important than the data itself. Machines can never imitate inefficient experiences that humans do. As The Guardian pointed out in the article, AI is expected to enable machines to find patterns in data and make decisions faster than humans. But what if you make bad decisions faster? This is likely to happen in practice when humans give up their ownership of thinking about data. As a result, the machine will make decisions for itself.

Less data, more knowledge

But in a real project, putting more people into these tasks is more difficult than subtracting them. According to Majunath Batt, Gartner research vice president, AI is influenced by the data humans choose to train their machines. In a cascade, the results of this algorithm influence the data we make decisions about. “People consume reality in the form of data, but data can be changed, transformed, and given a name in a form that is convenient to consume. In the end, humans are forced to live within a highly manipulated view of the world, within a limited range.” said.

“Successful machine learning projects require data,” says Eugene Yan, a scientist at Amazon. “You need more powerful pipelines to support the flow of data. But most importantly, you need to give each piece of data a name or label,” he said. said. However, it is impossible to obtain properly labeled data without a skilled person. In order to label it, you need to understand the data.

It is at this point that we again draw attention to Gartner analyst Svlana Sikal’s point more than a decade ago. “In companies, there are a lot of people who can read business lines between lines,” he said. “These people are optimized to find the right issues to ask of corporate data.” If there was something lacking at the time, as Diakkova pointed out, it would be an understanding of statistics. In other words, it is the ability to judge ‘how much is enough’ to produce a sufficiently meaningful result.

This is why data science is so difficult. It is in the same vein that ‘talent’ always ranks first in all surveys asking about the difficulty of introducing AI/ML. It could be easily summed up simply as a lack of data scientists, but that may not be what we should be most concerned about right now. It is the lack of basic understanding of statistics, mathematics, and the contents of the company’s business.
[email protected]

Source: ITWorld Korea by

*The article has been translated based on the content of ITWorld Korea by If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!