4-19. High-throughput materials data collection and database fusion techniques and applications

4-19. High-throughput materials data collection and database fusion techniques and applications

Chaofang Dong a*, Yucheng Ji, Ni Li, Min Ao, Li Wang, Kui Xiao, Xiaogang Li

Beijing Advanced Innovation Center for Materials Genome Engineering, National Materials Corrosion Data Center, University of Science and Technology Beijing, Beijing 100083, China

Abstract: The quality stability and durability of high-performance materials restrict the localization process of major engineering and equipment materials, which is attributed to the low precision and high fluctuation of composition and structure on the microscale of materials. In response to the above problems and challenges, the key technologies of big data in material genetic engineering are adopted. Through data accumulation, data mining, modeling, simulation and data sharing service, the evaluation model combining micro and macro performance which based on High-throughput data collection is developed to optimized the experimental evaluation technology so as to speed up the research and development of new materials.

Taking the high-speed train materials in advanced rail transit equipment as entry point, the materials performance evaluation model based on data accumulation, data mining, modeling, simulation and data sharing service was developed. Not only has this model optimized the efficient and economical experimental scheme, it also shortens the performance evaluation period of domestic materials which accelerates the application of new materials. Moreover, the intrinsic law between materials composition, structure and performance was studied. The key basic data of aluminum alloy and steel for high-speed train which mainly meet export demand of The Belt and Road Initiative were accumulated. These key basic data laid the foundation for the processing of materials and the development of new protective process.

This paper focuses on the massive data of experiments/calculations obtained from typical materials for high-speed trains, and studies distributed analysis, deep mining, automatic storage, cross-cluster scheduling and data recommendation techniques. With these technologies, we have developed a materials structure- performance data collection and fusion service support platform for material genetic engineering research. This MGE big data analysis PaaS platform including the basic data layer, data processing layer and application service can be applied to massive materials data storage, calculation, analysis, query and development. The integration, mining and application platform for multi-source heterogeneous data of materials was built, and the detection indexes of corrosion fatigue and stress corrosion of aluminum alloys were established. What’s more, the basic database system was developed for cross-database retrieval and analysis with the manufacturing process data. A series of big data experimental evaluation techniques based on the combination of microscopic and macroscopic performance of materials for online High-throughput data acquisition have been developed to provide an example for accelerating he application of new materials.

In materials calculation, experiment and data mining, the material process- structure-performance correlation is combined with computer-aided design, the application of material genetic engineer concept in the development of manufacturing design is promoted. Considering the uncertainty of multi-scale modeling of materials, a decision support system for multi-objective system design with multiple machine learning algorithms is established. Standardized multi-source heterogeneous material data are collected and combined with experimental method and science simulation. By quantifying the spatial statistics of microstructures, a data link for modeling and simulation is formed to establish a chassis pillow. A new method is provided for the relationship between the microstructure, crystal structure and properties of aluminum alloy, steel and protective coating for high-speed trains.

Keyword: High-throughput acquisition; Database; Data fusion; Machine learning.

材料结构-性能数据高通量采集与数据库融合技术及应用

董超芳*,纪毓成,李妮,敖敏,王力,肖葵,李晓刚

北京科技大学新材料技术研究院,材料基因工程北京市高精尖中心,国家材料腐蚀与防护科学数据中心

摘要:高性能材料的质量稳定性和耐久性问题制约着重大工程和装备用材的国产化进程,根本原因在于材料内部微观尺度上成分、组织结构控制精度低、波动大。针对上述难题和挑战,采用材料基因工程中的大数据关键技术,通过数据积累、挖掘建模、模拟仿真和共享服务的数据链建设,发展基于高通量数据采集的材料微观和宏观性能相结合的评估模型,优化实验评价技术,为加速新材料研发服务。

以先进轨道交通装备中的高铁列车材料为研究切入点,通过数据积累、挖掘建模、模拟仿真和共享服务的数据链建设,发展了基于大数据的材料性能评估模型,优化了出高效且经济的实验方案,从而了缩短国产材料性能评价周期,加速新材料的应用示范。研究了材料组成、结构和性能之间的内在规律,积累了“一带一路”战略中高铁出口需求的底架枕梁用铝合金、转向架用钢的关键基础数据,为材料的加工与新型防护工艺的制定奠定了基础。

本文围绕高铁列车用典型材料所获得的实验/计算的海量数据,研究了分布式分析和深度挖掘技术、自动存储技术、跨集群调度技术与数据主动服务推荐技术,开发面向MGE研发的材料结构-性能数据采集与数据融合服务支撑平台。数据融合处理平台,包括基础数据层、数据加工层、应用服务支撑及应用服务等,建成的MGE大数据分析 PaaS 平台(一体化集群部署方案),可应用于海量材料数据存储、计算、分析、查询和开发。搭建了面向材料多源异构数据的集成、挖掘和应用平台,开展了铝合金腐蚀疲劳、应力腐蚀等检测指标研究,开发了基础数据库系统,可与制造加工数据进行跨库检索与对比分析,发展了一套基于在线高通量数据采集的材料微观和宏观性能相结合的大数据实验评价技术,为加速新材料的应用提供示范。

在材料计算数据、实验数据和数据挖掘方面,将材料工艺-结构-性能的相关性与计算机辅助设计相结合,推广了材料基因工程理念在面向制造的设计的开发中的应用。考虑了材料多尺度建模的不确定性,建立了多种机器学习算法辅助多目标系统设计的决策支持系统。标准化采集了多源异构材料数据,并将实验测试方法与数据科学的建模和仿真相结合,通过量化微观结构的空间统计,形成了建模与仿真的数据链,为建立底架枕梁用铝合金、转向架用钢和防护涂层材料的组织、结构和性能的关系与性能评价提供了新的方法。

关键词:高通量采集;数据库;数据融合;机器学习

Brief Introduction of Speaker
董超芳

北京科技大学教授、博导。围绕金属腐蚀集成计算与耐蚀材料设计开展研究,基于材料基因工程理念,探索由经验指导转变为理性设计与调控方法。2009年入选北京市科技新星计划,2011年入选教育部新世纪优秀人才计划,2012年获教育部霍英东青年教师奖,2012年获国家优青基金资助,2017年国家重点研发计划项首席科学家。发表SCI论文200余篇,H-因子27。获国家发明专利20项,美国专利2项,国家科技进步二等奖1项,省部级科技进步一等奖5项。

Email: cfdong@ustb.edu.cn