The Status of the IUPAC InChI Chemical Structure Standard

It is well-known that IUPAC has been working for many years on the systematization of chemical compounds and the standardization of their nomenclature and concomitant terminology. This direction of development is quite easy to explain. Every day, the data produced by investigators all over the Earth increases dramatically; therefore, it is necessary to invent a universal chemical communication language that will allow us to merge and accumulate all discovered information. The nature of arising problems can be easily illustrated in the example of common names used. Let us consider a homological series of normal (unbranched) monocarboxylic acids: formic, acetic, propionic, butyric, valeric, caproic, enanthic, etc. Usually, these “aliases” were given due tothe natural sources they were extracted from or the names of compounds they were synthesized from. Based on these “aliases”, it is hard to imagine their structures without preliminary acquaintance. The nomenclature proposed by IUPAC provides an effective formalism of roots, prefixes and suffixes that allows for unique way to name the compounds under study (roots define the number of carbon atoms in the parental chain; suffixes show what class of compounds does the given structure belong to): methanoic, ethanoic, propanoic, butanoic, etc. The proposed nomenclature is based on a fundamental principle: “every name corresponds to only one compound, and every compound corresponds to only one name”. It automatically eliminates all uncertainty that appears in the case of common and rational nomenclatures.

Since big data sets became available due to impetuous progress in computer technologies and the Internet, the new problem arose for the IUPAC. There appeared a necessity to handle complicated molecular structures in a “string-like” formulation for an efficient data exchange and fast search among billions of structures in the chemical space. The new millennium was commemorated by the start of the IUPAC universal international chemical identifier development. In a similar way, in April of 2005, the first version of InChi (International Chemical Identifier) was released. Later, in January of 2009, the hash-like variant of InChi – InChiKey was proposed. The common advantage of this representation is the constant length of 27 characters, while for InChi there is no such limitation. It was developed especially for use within the search engines. However, InChIKey is not reversible and requires the storage of a dictionary that shows accordance of a particular compound to every InChIKey string.

The use of InChi and InChIKey opens exciting perspectives not only for web-search, data storage and database merging. Since the new extension that covers mixtures is currently under development, it allows using InChI for labelling vials with reagents. There is a possibility to combine the convenience of InChIKey and QR-codes. By furnishing all vials with a QR-code that stores the identification strings, it is easy to sort and search compounds at the chemical storehouses by automatic control systems or just use portable scanners/smartphones in laboratories. QR codes provide a high level of error corrections (even in case of a 20% damaged image it still can recognize the encoded data) and allow to store up to 3,000 characters. In figures below, several different links are encoded with QR-codes that contain InChIKey of caffeine.

https://pubchem.ncbi.nlm.nih.gov/compound/RYYVLZVUVIJVGH-UHFFFAOYSA-N
Link to PubChem

So, there are many possible applications of InChI and InChIKey combined with QR-codes that could simplify daily routine work in the chemical industry. It provides a lot of new directions in web/mobile development to make chemistry better.