Wei Xu1,2, *, Chunguang Shen1, Chenchong Wang1, *, Xiaolu Wei1, Yong Li1, Sybrand van der Zwaag2.
1. State key laboratory of rolling and automation, Northeastern University, Shenyang, Liaoning 110819, China;
2. Novel Aerospace Materials Group, Faculty of Aerospace Engineering, Delft University of Technology, 2629 HS Delft, The Netherlands
Abstract: With the development of the materials genome philosophy and data mining methodologies, machine learning (ML) has been widely applied for discovering new materials in various systems including high-end steels with improved performance. Although recently, some attempts have been made to incorporate physical features in the ML process, its effects have not been demonstrated and systematically analysed nor experimentally validated with prototype alloys. To address this issue, a physical metallurgy (PM) -guided ML model was developed, wherein intermediate parameters were generated based on original inputs and PM principles, e.g., equilibrium volume fraction (Vf) and driving force (Df) for precipitation, and these were added to the original dataset vectors as extra dimensions to participate in and guide the ML process. As a result, the ML process becomes more robust when dealing with small datasets by improving the data quality and enriching data information. Therefore, a new material design method is proposed combining PM-guided ML regression, ML classifier and a genetic algorithm (GA).
In the present work, compositions and hardness data from various UHS stainless steels reported in the literature were collected and grouped according to their principal strengthening precipitates. The dataset was used to train the SVM-PM model and design new alloys of UHS stainless steel with a high hardness due to strengthening by the R-phase. With the main strengthening precipitate being pre-defined, the ageing temperature and ageing time were selected as the main thermal processing parameters. Unlike conventional ML database constructions is which take only the original inputs, e.g., composition and process parameters, and build a direct correlation with target output properties, as discussed earlier, the microstructure characteristics are the essential link bridging the original inputs and target output properties; thus, these characteristics must be considered in an appropriate way. Moreover, the contribution of precipitation strengthening mainly depends on both the Vf and the (average) size of the precipitates. The size of the precipitates has a close relationship with nucleation kinetics, which is mainly controlled by the thermodynamic Df. Therefore, intermediate PM parameters representing microstructure features, i.e., Vf and Df, were also introduced as model inputs. Vf and Df were calculated by Thermo-Calc® software using the TCFE9 database. The experimental hardness was set as the output target. The dataset of the R-phase contained 102 samples in total; 80 samples were randomly selected as the training set, which was used to optimize parameters in SVM-PM models with RBF, and the other 22 samples were used as the testing set for the generalization ability of the SVM-PM models. Given the very limited amount of data in present work (only 102 samples), the performance of trained model greatly varies with different partitions of training and testing sets. Therefore, to better evaluate the generalization ability of the SVM-PM model, the ‘multiple hold-out method’ was employed for the partition of training and testing sets, in which the dataset was randomly divided into training and testing sets by 500 times to build 500 different SVM-PM models, and the mean and maximum of evaluation function of all 500 SVM-PM models were taken as the evaluation indices. The prediction results are shown in Fig. 1. In this design exercise, the confidence of designed results by GA directly depended on the generalization ability of the objective function from the SVM-PM model, and therefore models of higher R2 were preferred. However, to maintain the model diversity associated with different partitioning, a substantial number of models were required. To make the balance of high generalization ability and good diversity, a criterion of R2 >95%, i.e., 155 best models out of 500 possible models, was enforced in the design process. For each of the 155 selected SVM-PM models, the GA was applied to find a new solution (composition, ageing temperature and ageing time) with the maximal hardness. Therefore, 155 new alloys can be designed following this approach. However, not all of the newly designed alloy possessed hardness values beyond the original maximum value in the dataset (51 HRC). In this case, 39 of the 155 design results were removed as they yielded lower maximal values. However, in order to determine a limited number of prototype alloys for experimental validation in the present work, a classifier was applied to further refine the solution for experimental validation. The classifier was trained based on the complete experimental dataset to filter solutions with hardness above 49 HRC, and subsequently applied to 116 optimal solutions so as to obtain solutions in the category of ‘high hardness’, which gave the highest likelihood to experimentally outperform existing alloys. As a result, only 11 optimal solutions were classified as ‘high hardness’, and the other design results were defined as “low hardness”. Detailed examination of those 11 solutions reveals that, the dispersion originates more from the ageing temperature, while the composition-wise solutions can be clearly classified into two groups with very similar compositions, represented by Alloy 1 and Alloy 2 respectively. The composition and ageing condition of two most promising alloys are given in Table 1.
Alloy 1 had a composition similar to that of one of the original alloys yielding the maximum hardness value in the dataset. Therefore, Alloy 2 (with a relatively lower total alloy content) was chosen for further experimental validation. The maximum hardness of Alloy 2 was greater than the maximum value in the original dataset (51 HRC, dashed line in Fig. 4). In addition to the successful design of new alloys with a high hardness, the predicted optimal ageing temperature and time were highly consistent with results from the experimental optimization, strongly indicating that the SVM-PM&GA model has a strong ability to accurately and efficiently design the alloy systems and ageing conditions of UHS stainless steel.
Keywords: Alloy design; Machine learning; Physical metallurgy; Stainless steel
Fig. 1. Experimental values vs. values predicted by the SVM-PM model for 500 different partitions of the training and testing sets: (a) training set of mean result; (b) testing set of mean result; (c) training set of optimal result; (d) testing set of optimal result.
图1超高强不锈钢实验值与SVR-PM模型预测值对比
表1 设计的超高强不锈钢合金成分及热处理工艺
Table 1. Designed alloy composition and ageing conditions and actual alloy composition (based on the recommended composition for alloy 2). Compositions are in weight percentages. Temperature and time are in Celsius and hour, respectively.
Fe | C | Cr | Ni | Co | Mo | TAge | tage | |
Alloy 1 | Balance | 0.090 | 12.00 | 6.00 | 11.50 | 5.30 | 500 | 3.7 |
Alloy 2 | Balance | 0.002 | 13.00 | 1.50 | 13.00 | 5.30 | 560 | 4.0 |
Actual | Balance | 0.004 | 13.20 | 1.54 | 12.90 | 5.49 | 520-600 | 0-6 |
基于物理冶金学引导的机器学习及超高强不锈钢的智能设计
徐伟1,2*,沈春光1,魏晓蓼1,李泳1,王晨充1,Sybrand van der Zwaag2,
1.东北大学轧制技术及连轧自动化国家重点实验室,2.代尔夫特理工大学航天航空工程学院
摘要:随着材料基因组计划和数据挖掘的发展,机器学习(ML)已被广泛应用于各种新材料的发现和设计,如超高强不锈钢。而机器学习过程主要是将输入参数(如成分和工艺参数)直接与输出目标属性相关联的统计方法,并不涉及物理冶金学(PM)指导。这种方法将材料设计限制在纯粹的数学过程中,忽略了微观结构优化的基本要素,因此可能误导新材料发现过程或降低设计效率。为了解决此问题,本研究开发了一种基于物理冶金学引导的机器学习模型,其中基于原始输入和物理冶金学原理生成中间参数,例如析出相平衡体积分数和析出驱动力,并将这些参数引入到原始数据集中,参与并指导机器学习过程。通过此方法提高了数据集质量并丰富了数据信息。在此基础上,本研究提出了结合物理冶金学引导的机器学习回归器、机器学习分类器和遗传算法的新型材料设计方法,设计并验证了新型超高强不锈钢,并详细讨论了影响机器学习模型的各重要因素。
本研究结合机器学习的材料设计过程分为以下四个步骤:1)通过文献检索和实验建立材料成分-工艺-性能数据库,2)基于材料数据库建立物理冶金学引导的链接输入参数(成分、工艺)和输出性能(硬度)的支持向量机模型(SVR-PM),3)采用遗传算法(GA)在大规模解集范围内筛选具有最佳硬度的成分、工艺数据集,4)建立智能筛选方法识别最佳成分、工艺并进行实验验证。对于步骤1,从文献中收集已有超高强不锈钢的成分-工艺-性能数据,并根据其主要析出强化相进行分组,包括R相(102组)、Cu簇(124组)和Ni3Ti(116组),使用R相的数据训练SVR-PM模型,并最终设计出R相强化的高硬度超高强不锈钢。对于步骤2,SVR-PM模型的输入参数包括钢的成分和工艺,成分参数为超高强不锈钢的6种主要合金元素的含量,工艺参数为钢的时效温度及时效时间。同时,引入PM参数析出相体积分数Vf和析出驱动力Df作为模型输入以指导机器学习过程,PM参数通过Thermo-Calc计算。R相强化的102组数据中,随机选择80组作为训练集,其余22组作为测试集,数据随机分类500次之后构建500个SVR-PM模型,将这些模型所预测硬度的平均值和最大值作为评价指标,超高强不锈钢的实验值及预测值结果对比如图1所示,该结果表明SVR-PM模型具有良好的泛化能力和高的预测精度。对于步骤3,SVR-PM模型的输出预测作为GA的目标函数,应用GA在解集范围内寻找最佳硬度条件下的成分工艺参数,500个SVR-PM模型结果经过筛选后,最终确定了116组最佳的成分工艺。对于步骤4,建立支持向量分类器模型(SVC)在前述116组结果中进一步筛选高可靠性的成分工艺,最终设计出两种合金成分如表1所示,并选择合金2进行实验验证。实验验证结果表明,采用模型设计的最佳成分参数和最终性能有极好的一致性。
以上材料设计方法仅使用从文献中获得的小型数据库便成功应用于超高强不锈钢的设计,证明了物理冶金学引导的机器学习模型在高强钢设计中有极佳的应用潜力。本研究工作以单一性能指标即硬度为优化目标。未来,多目标的优化将是下一步工作的重点。
关键词:合金设计;机器学习;物理冶金学;不锈钢
轧制技术及连轧自动化国家重点实验室(东北大学)教授、博士生导师,青年千人计划获得者(2015年)、国家自然科学基金优秀青年基金获得者(2017年)。2009年博士毕业于荷兰代尔夫特科技大学材料科学与工程专业(优秀博士毕业生,4%)。归国前历任安赛罗米塔尔全球研发中心研究员、高级研究员、荷兰代尔夫特科技大学助理教授。一直从事基于材料基因理念的先进高强钢的计算设计和产业化开发工作。2012年荣获荷兰皇家科学协会最佳青年科学家, 入选科技部中青年科技创新领军人才(2018)、中组部万人计划科技创新领军人才(2019)、辽宁省兴辽英才计划‘青年拔尖人才’(2018),江苏省淮上英才计划‘创新领军人才’(2017)等人才计划。SCI/EI收录论文70余篇。
Email: xuwei@ral.neu.edu.cn