On July 14-15, more than 30 invited researchers, cheminformatics specialists, educators, publishers and librarians attended a workshop at the Environmental Protection Agency Headquarters in Research Triangle Park, NC to begin a dialogue on the prioritization of digital data challenges in chemistry. Organized by the Research Data Alliance (RDA) and IUPAC, the workshop discussions focused on the standardization of machine representation of chemical structures and chemical terminology, with the long-term objective of this joint collaboration being the development of a digital infrastructure that will facilitate the global exchange of chemistry data.
“Chemical structures and terminology are integral to handling chemical data in any context, and universally-applicable formats and normalization protocols will enable broader and more consistent use of data by any number of stakeholders,” said Leah McEwen, a co-founder of RDA’s Chemistry Research Data Interest Group (CRDIG) and a member of IUPAC’s Committee on Publications and Cheminformatics Data Standards (CPCDS) – the two workshop organizing-bodies. “The key to success will be the ability of the larger community to adopt standard formats through engagement in standards of practice, from tool development to individual use and educational guidelines.”
Prior to the workshop, detailed overviews on each topic were sent to the broader IUPAC, RDA, chemical information professional, and education communities in order to obtain their input and provide a foundation for workshop discussions. The issues gathered were further distilled during the intense sessions led by Leah McEwen and other CRDIG founders, Evan Bolton, Tony Williams, and Stuart Chalk. These discussions led to the identification of several strawman proposals on which to focus further activity. Some of these are as follows:
- developing a Best Practice for standardizing software to ensure that despite the package that is used the same chemical structure is generated;
- developing recommendations for and standardizing the use of a handful of open chemical file formats/representations to improve interoperability and minimize errors in chemistry data exchange;
- updating the IUPAC Chemical Structure Drawing standards to consider machine interpretation of chemical depictions and prevent corruption by chemist intention when converted to chemical structures;
- educating all stakeholders on the importance of chemical structure standardization, its importance for chemical data exchange between humans and machines, and how these issues relate to their own work;
- developing a small scale ontology of chemical terms based on those in the current IUPAC Orange Book; and
- analyzing the current chemical data transfer and communication landscape for potential applications of semantic terminology.
McEwen noted that additional topics are being considered such as the development of a semantic web exchange format for chemical structures.
“This workshop grew out of the very successful Data Summit held by the Chemical Information Division (CINF) of the American Chemical Society that was held earlier this year in San Diego, CA,” McEwen noted. “At that meeting we were able to identify a long list of “pain points” in the exchange of chemical information and we knew that further discussion was needed. The workshop is just the beginning of an ongoing dialogue between relevant stakeholders around the globe and we are grateful that RDA and IUPAC were willing to help us get the dialogue rolling!”
Next steps include the publication of minutes and reports from the workshop as well as several opportunities for discussion across the global chemical information community on these critical issues. These include: 1) the symposium “Chemistry Data for the People: From Policy to Practice,” to be held at the ACS Meeting in Philadelphia, specifically the August 22 afternoon session: Chemistry data pain points: Distilled, analyzed, and next steps. 2) the SciDataCon 2016 in Denver, September 11-13 that will include a presentation on IUPAC activities, including a workshop summary; and 3) the RDA 8th Plenary session, also to be held in Denver, September 15-17, with the Chemistry Research Data breakout discussions scheduled for Friday, September 16.
For more information or to be added to the mailing list for updates on this ongoing dialogue, contact Leah McEwen at [email protected], or visit the websites of RDA (www.rd-alliance.org/groups/chemistry-research-data-interest-group.html) and IUPAC (www.iupac.org/body/024).
& from RDA website … We DIG Chemistry!