The system design and the standard framework of material genome engineering data
Hong Wang
School of Materials Science and Engineering and the MGI Center, Shanghai Jiao Tong University, 800 Dongchun Road, Shanghai 200240, China
EXTENDED ABSTRACT: Material genome engineering is a new R & D concept of material science. Through the paradigm shift from "trial and error method" to the data-driven mode featured by "data + artificial intelligence", the rational design of new materials and processes is realized. In this mode, material research activities focus on data generation and data processing, so as to master the correlation law between composition, organization, process and performance faster, more efficient and less cost.
Data is among the fundations for the implementation of data-driven material science. Material genome engineering data system is a new material data system designed from top-down for promoting the open sharing of digital data. It is named "material genome engineering data" with an intention of differentiating from the convention data form - each data provides a specific relationship between material parameters, that is, the relationship of composition – structure – process - performance, or part of it.
Material genetic engineering data system includes two parts: storage system and application system.
The storage system specifies the content, form and format of data storage and the method of data composition. Its core idea is reflected in the General rules for genetic engineering data of CSTM materials, issued in 2019, to ensure that the data comply with the FAIR principle (findable, accessible, interoperable, reusable). The characteristics specified by the general rules are that each data covers only single action (preparation / characterization / processing) i.e. contains only one of the following three types of information, namely, single sample (actual or virtual generated by calculation), original data generated by single measurement (experiment or calculation), or derived data generated by single analysis. Since a single piece of data does not have an association relationship, which leaves the greatest extent flexibility to the data organization when being used. In addition, each data is given a unique and permanent identification (DOI, GB / T32843-2016, or others). All data stored in the database shall be standardized.
Application system specifies the content, form and format of application data. In any case, the data must be applied with the required material information, i.e. the relationship between composition - structure - process - performance, or part thereof. The elements contained in the application data are defined by the user according to needs and thus a template is established. The content of the data set is retrieved from the material genome engineering database to assemble the application data set as needed each time and is not stored in the material genome engineering database; Standard vocabulary must be used for data search.
In accordance with the above system design, the data standard framework also includes two parts: storage data standard and application data standard. The storage data standard is a series of standards for data generated via specific technologies and in comply with the general rules. Application data standard is a template in a specific technical field, which can be discussed and determined by domain experts and established as a professional application template standard through normal procedures.
REFERENCES
[1] Wang H, Xiang X, Zhang L. Data + AI is the core of materials genomic engineering. Sci. Technol. Rev. 2018; 36(14):15–21.
[2] Wang H., Xiang X, Zhang L., On the Materials Innovation Infrastructure, Engineering, 6, 609-611 (2020).
[3] General rule for materials genome engineering data. China standards of testing and materials T/CSTM 00120-2019, Aug 13, 2019. In Chinese.
Dr. Hong Wang is a "Zhiyuan" Chair Professor and Director, Materials Genome Initiative Center, Shanghai Jiao Tong University. He earned a B.S. in Geophysics from Peking University and a Ph.D. in Materials Science and Engineering from the University of Illinois at Urbana-Champaign. He worked in thin film research field for long time with global companies such as SONY, Panasonic and Guardian Industries Corp. in the United States before joining China Building Materials Academy, Beijing, in 2010 as the Chief Scientist for the National Research Center for Glass Processing and Associate Director of the State Key Laboratory of Green Building Materials. Since 2012, he's been actively promoting the Material Genome Initiative, a new paradigm for acceleration from materials discovery to deployment, in China. His current research also includes development of coated glass and smart windows for energy efficient buildings, as well as solar heat conversion coatings.