4-16. Procedure for building a standard specific dataset based on materials genome database

4-16. Procedure for building a standard specific dataset based on materials genome database

Hong Wang1, Haiqing Yin2 and Lanting Zhang1

1. Materials Genome Initiative Center and the School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240

2. Materials Design Group, Collaborative Innovation Center of Steel Technology,

University of Science and Technology Beijing, Beijing, 100083

Abstract: The recently released CSTM standard “General Rules for Materials Genome Engineering Data” stipulates the information that must be collected and the content that must be included in the data production process, to meet the requirements of data-driven Materials Science research. According to the “General Rules”, the materials genome engineering (MGE) data are divided into three class: sample information, source data (unprocessed data) and processed data (data obtained by analysis and processing of existing data). Each action (sample preparation/characterization/ calculation/data processing) is defined as a stand-alone entry unit, and assigned an independent resource identification (per DOI or Chinese national standard GB/T 32843-2016). Each data entry should include metadata related to the process of action as complete as possible. With the intention of breaking down any direct link between material and its parameters, the "general rules" is designed to provide the flexibility of data utilization and combination to the maximum extent, so as to ensure the usage and collection of data conforming to the FAIR principles (Findable, accessible, interoperable, and reusable) and to promote the sharing of data. To list sample information as a class of data is a unique choice. Its greatest advantage is to make the samples themselves a part of the social resources conforming to the FAIR principles, so that the samples can be found, shared and reused, similar to data.

Keywords: Intelligent design of materials; Software; Database; Industrial product design



1. 上海交通大学材料基因组联合研究中心,材料科学与工程学院,上海,200240

2. 北京科技大学材料设计团队,钢铁共性技术协同创新中心,北京,100083

摘要最近发布的中国材料与试验团体标准委员会(CSTM)材料基因工程数据通则(T/CSTM 00120-2019),根据材料科学在数据驱动模式下对数据的需求,规定了数据生产过程中必须收集的信息和必须包含的内容。它将数据分为样品信息、源数据(未经处理的数据)与衍生数据(经分析处理得到的数据)三类,每条数据将被赋予唯一和永久的科学资源标识(DOI或根据国标GB/T 32843-2016),并包括记录充分的有关相应样品制备/表征/数据处理事件的信息作为元数据。“通则”意在打碎现今数据中内禀的材料-参数间关联,最大限度地提供了数据使用、组合的灵活性,从而确保数据的利用和收录符合FAIR(Findable, Accessible, Interoperable, Reusable,可发现、可获取、可交互、可再利用)原则,促进数据的社会化共享。其中将样品单独列为一类数据是之前任何其它数据标准中都没有的做法。这样做的最大优点是不仅使数据满足FAIR原则,样品本身也成为符合FAIR原则的社会资源,便于样品可以共享、多用和重复使用。




Brief Introduction of Speaker


Email: hongwang2@sjtu.edu.cn