As the demand for corporate data innovation increases, the graphic processing unit (GPU)-based artificial intelligence (AI) analysis trend is spreading. GPUs provide computational performance that can reduce computing infrastructure by more than 40%, but the amount of data to be processed increases by more than 50%. Analysis using GPU requires high bandwidth and fast response performance, and if resources in a certain section are insufficient, a large input/output (I/O) bottleneck may occur.
However, it is inefficient to add computing resources every time a performance issue arises. In particular, it is difficult to satisfy both performance and scalability when infrastructure is operated with legacy storage. As the number of GPU servers in data centers increases, the requirements for high-performance storage are also increasing.
◇Increasing demand for data utilization
According to a January 2020 survey by the American Al Research Association, 85% of enterprise infrastructure and operations leaders said they plan to use Al in their infrastructure within the next two years. Many of them are interested in leveraging Al applications, but are often not ready to address the storage requirements and data management issues of the growing datasets of large-scale machine learning deployments.
Data management workflows, such as data collection, preparation, inference, model training, preparation and archiving, each have their own compute, storage, and networking requirements. This can create silo problems and increase cost and time investment. In the data collection stage, it should be possible to collect data from various sources through multi-protocol. The preparation phase requires high performance, and the inference phase requires low latency. Model training requires both. Every step requires massive scaling and automated data management, but you can’t increase your budget indefinitely for that.
Efficient data management facilitates the adoption of new business models, increases corporate competitiveness, and enhances customer experience and loyalty. By shortening the time to market for products and services, companies can reduce costs and increase profits.
◇ ‘Object Storage’, a solution to implementing a data lake
If you want to get insights from all your data, you need to know the data flow. However, traditional methods take a long time and cannot be sure that the collected data is ‘all’. However, it is a different story if you approach corporate data ‘in one place’. A ‘data lake’ for data storage and management prevents data silos and is excellent for collecting all kinds of data for analysis as a ‘data central supply point’. Data lakes also serve as self-service analytics platforms, allowing businesses to store and analyze information without a pre-determined purpose. In particular, real-time data generated at industrial sites such as manufacturing and communication can be combined to innovate corporate data.
As data grows, so does the cost of infrastructure, so enterprises are migrating data to the cloud, which is highly flexible and scalable. However, it is still not easy to manage the increasing amount of data, including edge data pouring in from industrial sites such as manufacturing and communication. An infrastructure that writes data right where it is needed and reduces data storage and management costs is desperately needed.
An increasing number of businesses are in need of object storage. This is because object storage is the most cost-effective solution for implementing a data lake strategy that can store all data in one place and retrieve it when needed while processing large-scale data. Object storage manages data in units of objects rather than files or blocks, and supports all types of structured and unstructured data.
Businesses can use new types of data such as Internet of Things (IoT) sensor data, videos, and images, as well as vast amounts of historical data accumulated within the company to find the values and insights needed for their business. For example, compared to the financial sector where real-time data processing is possible, the manufacturing sector was difficult to ‘real-time processing and analysis’. However, with object storage-based data lakes, data can be directly leveraged with simple analytics.
Data lakes require high performance and large-scale data storage, and both conditions must be balanced between ‘providing performance’ and ‘accepting data’. High-performance storage that combines the advantages of object storage and speed is the best choice for companies that want to put structured, unstructured, and semi-structured data in one place, take them out according to the purpose, and analyze data pouring out every second from a large production line in real time.
◇Integrated AI/ML solution that supports all stages of data management
Hyosung Information System’s HCSF (Hitachi Content Software for File) is an ultra-high-performance file storage solution that integrates a high-performance parallel file system and object storage, and is optimized for HPC, AI/ML analysis, and GPU-accelerated workloads. HCSF provides the capacity of object storage and the speed of a distributed file system with cloud functions, and data collection is easy with file and object protocol support. The distributed file system provides high performance and low latency in data preparation, model training, and inference phases. In addition, by utilizing the object storage function, large-capacity storage can be used at an economical cost, and powerful data management automation based on metadata is also possible.
Built-in intelligent metadata-driven data automation creates a single pool of capacity that can scale compute and storage capacity independently. Data movement between on-premises and public storage is possible to achieve cost savings, compliance and business continuity. HCSF is also faster. It is 3x faster than local flash drives and 10x faster than traditional all-flash arrays, maximizing efficiency by maximizing computing resources. As nodes are added, performance is also improved, maximizing the utilization of computing resources.
It accurately analyzes large amounts of data, provides petabyte (PB) datasets easily, and processes them smoothly regardless of file size. Integrated search and auditing are possible with HCSF, a single data lake, and the search is fast, reducing costs as well as shortening audit time. It can reduce risks such as missing relevant data and respond appropriately to rapidly changing global regulations. HCSF is easy to deploy and manage, reducing the total cost of ownership. It removes silos and redundant copies to move data seamlessly into a single storage with built-in backup for the entire data pipeline, providing high flexibility in the public cloud.
◇ Shows value in high-performance workloads in the industrial sector
With HCSF, companies can immediately launch Al/ML analysis, GPU-accelerated workflow projects. In a new competitive environment that requires the use of AI and ML, it is possible to win by operating more models with more complex algorithms ahead of time.
In the financial services sector, accurate and rapid data analysis, safe data protection, and various data processing functions can support various regulations and risk management, and real-time customer data analysis can improve performance. Al and ML supported by HCSF can process hundreds of technical indicators in datasets thousands of times larger.
The financial industry, such as banking, is the most regulated industry, and if information silos increase, governance will not function properly and it will be difficult to properly respond to changed regulations or produce correct data for audits. HCSF’s object storage is the best archiving and compliance solution, supporting sufficient data classification and safeguarding customer data. A global card company introduced HCSF and is using it as a high-performance storage for new analytical workloads such as a real-time fraud prevention system and high-performance data protection.
In life science fields such as bioinformatics, genomics, and precision medicine, the introduction of HCSF is increasing the performance of data analysis. HCSF is a suitable solution for a variety of data types, supporting a wide range of data profiles with highly specialized tools for genomics, proteomics, metabolomics, bioimaging, neurological studies, and more.
When costs increase due to rapidly increasing data rates, HCSF minimizes storage costs and provides hybrid cloud capabilities that allow data to be accessed at any time, enabling efficient and economical data storage. In addition, it can be applied to high-performance workloads in various fields, such as providing information on oil or gas exploration and production by analyzing geological data.
Hyosung Information System plans to target the domestic high-performance storage market based on HCSF project cases proven by global companies and support companies to successfully access digital transformation through AI/ML innovation.
Reporter Lee Hyang-seon [email protected]
Source: 전체 – 넥스트데일리 by www.nextdaily.co.kr.
*The article has been translated based on the content of 전체 – 넥스트데일리 by www.nextdaily.co.kr. If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!
*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.
*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!