Image Processing Method for Material Document Data Extraction
Yuexing Han*, Jinhua Xia, Yinggang Wang, Jiawang Zhang, Rui Zhang, Qiaochuan Chen, Hao Wang, Huiran Zhang
Shanghai University, Shanghai, 200444, China
EXTENDED ABSTRACT: According to the statistical law of material data, exploring the macro properties of materials plays an extremely important role in improving the preparation process of materials and developing new materials. The material literature contains a lot of visual material information. These visualization information has the characteristics of complexity, diversity, uneven distribution and small samples, which bring great challenges to researchers in the material field. At present, most of the data statistics, processing and analysis in material literature are handled manually by researchers. With the development of computer technology, using computer technology to automatically obtain some experimental and characterization data in material literature has become an inevitable trend. In this study, a new work is proposed to preliminarily process the visual data in the material literature.
Data extraction from material literature faces some challenges, including understanding visual data, corresponding interpretation information, and the key information extraction. For these challenges, we study the image method of key data extraction from material literature. The steps are as follows: first, we use keywords to automatically download relevant scientific literature from scientific journals; Then, classify all kinds of information in the literature; exclude the data such as text, material image and author's Avatar, and only extract table image, numerical image and formula image; the table image, numerical image and formula image are classified; data extraction from tables, curve data extraction from numerical image, and preliminarily realizing formula recognition; finally, save the identified information to the database in the specified format. Through the above steps, the rapid extraction of material data and image information in the literature can be realized.
This research can help researchers in related fields extract various data of material research and development in massive literature; and it can short the development cycle of the new materials.
Yuexing Han, graduated from the National University of electrical and communication of Japan. He works as a postdoctoral at the Japan Institute of industrial technology and Tokyo University of technology. He is now an associate professor, doctoral supervisor and master supervisor of the school of computer engineering and science of Shanghai University, and a Pujiang talent. He has long been engaged in material image processing research and material literature mining research. He has published more than 50 papers in academic journals, 5 licenses and 12 software copyrights.