Databases

New databases

Perhaps the most striking signature of combinatorial experimentation is the rapid generation of large amounts of data which convey process, structure, and properties information central to materials science. In this sense, no matter the organizational structure of the resulting mass of data, combinatorial experimentation delivers a wealth of new materials data. 

To be really useful this data must be organized in some systematic way, so that it constitutes a database of measurable utility.  At the very minimum, this may mean simply the publication or dissemination of a selected subset of the data in the form of tables and charts.  Usually this will be accompanied by descriptions of the methods and protocols by which the materials synthesis and characterization were carried out, since experimental methodology can significantly influence the results.  The database is even more valuable if it has been structured in anticipation of more general and comprehensive formats which include parameters that may not have been recorded in the new measurements, since such an approach anticipates more detailed experiments to follow as well as comparison with materials databases from other sources, past and future.

To the extent possible, materials databases should be highly structured to present all known process, structure, and property information in a uniform protocol, hopefully one agreed to by a major group of practitioners.  It should include significant annotation of synthesis and characterization methodology, also in a structured representation. 

Ultimately, the goal of informatics is not simply to organize data into useful databases, but to analyze and refine the data to generate models (analytical, response surface, reduced order, or other) which quantitatively reflect the behavior of the large quantities of data around which the database was built.  Thus, after applying other tools of informatics such as data filtering and data mining, models representing the data in mathematical and/or statistical form should emerge, in some sense as the highlight of the database.  If substantial filtering or mining have occurred, some reflection of their effects should be included at least as annotation.

Legacy databases

The world of materials science and engineering is replete with prior data, representing in some sense legacy databases.  On one hand, it is common that such legacy data is compromised in comparison to current standards because it is incomplete, because it used outmoded techniques for synthesis or characterization, and/or because it is not sensibly structured.  Nevertheless it is valuable to scrutinize and utilize such legacy databases, particularly if enough information is included about methodology to judge and filter the data.

Other non-empirical or derived databases may play an important role as well. Databases derived from various styles of theoretical work can convey very fundamental parameters which should ultimately be the basis for a deeper physical understanding of materials behavior (e.g., the Pauling electronegativity scale).  Other such databases may derive from well-refined experimentation work (e.g., lattice constants and crystal structures), or from numerical computations which are compared to experiment to derive fitting parameters (e.g., pseudopotential radii). Ultimately the value of such databases resides in the fact that subsequent analyses – particularly data mining – may identify correlations to these parameters and enable a deeper physical understanding of materials behavior.