Sciencemadness Discussion Board

bbcode support for SMILES structures to images

Nicodem - 12-7-2014 at 00:56

I wish we could implement a SMILES module to show chemical structures. That would be much more useful. The defunct Hive forum had one that was very good. I have no idea what it would take, or if it is possible at all on this platform. There are some free SMILES, CAS or chemical name to structure translators on the web (for example: http://www.openmolecules.org/name2structure.html). I wander if it is possible to implement their service into the forum code?

Polverone - 12-7-2014 at 18:21

It was a lot more work than MathJax but I integrated the openmolecules.net SMILES rendering service. I turned bbcode off for this post to show the tags.

Fructose:
[smiles]O[C@H]1[C@H](O)[C@H](O[C@]1(O)CO)CO[/smiles]

Caffeine:
[smiles]Cn1cnc2c1c(=O)n(c(=O)n2C)C[/smiles]

EDIT: Apparently the name service recognizes more than SMILES!

Water:
[smiles]water[/smiles]

PETN:
[smiles]PETN[/smiles]

[Edited on 7-13-2014 by Polverone]

Polverone - 12-7-2014 at 18:23

And now the real result:

Fructose:


Caffeine:


EDIT: Apparently the name service recognizes more than SMILES!

Water:


PETN:


[Edited on 7-13-2014 by Polverone]

Polverone - 12-7-2014 at 18:28

And finally, the meat of the code. To integrate it in another message board you'd want to instantiate a SmilesCode instance and call the bbcode_replace method on the message body after other bbcode tags have been transformed.

Code:
<?php /** * SMILES image manager based on openmolecules.org name2structure * service. * **/ if (!defined('IN_CODE')) { header('HTTP/1.0 403 Forbidden'); exit("Not allowed to run this file directly."); } define("IMAGE_DIR", "/var/www/smiles/"); define("SMILES_IMG_LOC", "www.sciencemadness.org/smiles/"); define("OM_PREFIX", "http://n2s.openmolecules.org/?name="); define("FAILURE_IMAGE", "smilesfailure.png"); class SmilesCode { // convert SMILES to img representing corresponding structure public function render($smiles) { $smiles = trim($smiles); $key = md5($smiles); $image_name = $key . '.png'; $file_name = IMAGE_DIR . $image_name; // image not stored locally -- try to generate image file if (!file_exists($file_name)) { $file_name = $this->generate_file($smiles, $image_name); } if (!empty($_SERVER['HTTPS']) && $_SERVER['HTTPS'] !== 'off') { $protocol = 'https://'; } else { $protocol = 'http://'; } $img_url = $protocol . SMILES_IMG_LOC . $image_name; $img = "<img src=\"$img_url\"></img>"; return $img; } // generate an image file using the openmolecules.net server public function generate_file($smiles, $image_name) { $remote_url = OM_PREFIX . $smiles; $http = curl_init($remote_url); curl_setopt($http, CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($http); $status = curl_getinfo($http, CURLINFO_HTTP_CODE); curl_close($http); if ($status != 200) { $image_name = FAILURE_IMAGE; } else { $out_name = IMAGE_DIR . $image_name; file_put_contents($out_name, $result); } return $image_name; } // replace all bbcode SMILES with molecular images public function bbcode_replace($message) { $codes = $this->extract_smiles($message); foreach ($codes as $code) { $bare_smiles = str_replace(array('[smiles]', '[/smiles]'), array(), $code); $rendered = $this->render($bare_smiles); $message = str_replace($code, $rendered, $message); } return $message; } // get all bbcode SMILES markup public function extract_smiles($message) { $codes = array(); $offset = 0; $begin_tag = '[smiles]'; $end_tag = '[/smiles]'; $current_tag = $begin_tag; while ($offset < strlen($message)) { $old_pos = $offset; $pos = strpos($message, $current_tag, $offset); if ($pos === false) { break; } else { $offset = $pos; } // found begin -- switch to end search if ($current_tag == $begin_tag) { $current_tag = $end_tag; } // found end -- capture contents and switch back to begin search else { $smile = substr($message, $old_pos, $pos - $old_pos + strlen($end_tag)); array_push($codes, $smile); $current_tag = $begin_tag; } } return $codes; } } ?>



[Edited on 7-13-2014 by Polverone]

Brain&Force - 13-7-2014 at 11:10

OK, I can't resolve this SMILES input:

[I-].[I-].[I-].[Tb+3](~O=C1C=C(C)N(C)N1C:2:C:C:C:C:C:2)(~O=C3C=C(C)N(C)N3C:4:C:C:C:C:C:4)(~O=C5C=C(C)N(C)N5C:6:C:C:C:C:C:6)(~O=C7C=C(C)N(C)N7C:8:C:C:C :C:C:8)(~O=C9C=C(C)N(C)N9C:%10:C:C:C:C:C:%10)~O=C%11C=C(C)N(C)N%11C:%12:C:C:C:C:C:%12

This is hexakis(antipyrine)terbium iodide - a coordination complex with coordination bonds.

Diamminesilver(I) doesn't work either:

[Ag+].N.N

Tetrachloronickelate works.



Is there something wrong with the input, or is this a problem on the software's end? The first structure was generated by me, the last two were pulled off of ChemSpider.

Polverone - 13-7-2014 at 12:51

There is a stray space in your SMILES: "...N7C:8:C:C:C :C:C:8..."

Even with the space removed, the openmolecules.net service cannot properly parse it. In fact, out of the toolkits incorporated into Cinfony, only RDKit could parse it: http://www.rdkit.org/

It looks like most chemoinformatics toolkits are not built with coordination chemistry in mind.

I had considered using RDKit to write the back end of the SMILES rendering service for here on the board. But I tested it with some complex chiral organics and the openmolecules.net service made better images, plus it would be more complicated to wrap RDKit. I suppose I could add an RDKit fallback renderer to try to handle whatever openmolecules.org fails on.

HeYBrO - 13-7-2014 at 14:14

I Know who has a new signature :D
EdiT: it is way too big for a signature, so maybe disable that...

[Edited on 13-7-2014 by HeYBrO]

[Edited on 13-7-2014 by HeYBrO]

Screen Shot 2014-07-14 at 8.15.41 am.png - 42kB

Polverone - 13-7-2014 at 15:57

I added code to automatically trim excess border space from generated images. They're still kind of large though. Should I use thumbnails plus links to full size images?

Nicodem - 14-7-2014 at 00:17

Beautiful job! The openmolecules.org service has a few limitations, but at least it is free and it looks like it has ambitions to stay active for the future. I agree that the images are big, but I would not like the idea of the thumbnails. Is it possible to rather automatically resize them? I think they would still be OK at half size.

Also note that the service recognizes not just SMILES, chemical names, some trivial names and generic abbreviations, it also takes CAS numbers.

For example (with codes switched off):

trivial name: strychnine
[smiles]strychnine[/smiles]


CAS number: ergocristine = 511-08-0
[smiles]511-08-0[/smiles]


generic API name: indinavir
[smiles]indinavir[/smiles]

Nicodem - 14-7-2014 at 00:18

...resulting in...

trivial name: strychnine



CAS number: ergocristine = 511-08-0



generic API name: indinavir

Chemosynthesis - 14-7-2014 at 00:57



Going to have to work on my salt SMILES. The wiki didn't work.


[Edited on 14-7-2014 by Chemosynthesis]
Just saw that :(.
Hmm.

Thank you both!

[Edited on 14-7-2014 by Chemosynthesis]

Polverone - 14-7-2014 at 00:57

They're automatically scaled to half size now. Too blurry?

Edit: Chemosynthesis, the smiles tag should be in lower case. I didn't try to make it case-insensitive.

[Edited on 7-14-2014 by Polverone]

Chemosynthesis - 14-7-2014 at 01:03

I might prefer 0.75 scale for easier visibility on valences, if that sounds good to others.

Nicodem - 14-7-2014 at 01:23

They look perfectly fine at half size to me, but I guess that pretty much depends on the monitor size and settings at the user end.
Chemosynthesis, the molecules I posted above are relatively larger than anything we usually discuss on this forum. Check the "normal sized" compound like the ones that Polverone posted above. They should be fine unless you have a small monitor or screen resolution settings.

The size is still slightly too big to depict simple schemes as one liners (unless I reduce the browser display size), but it is comprehensible with some effort:

[smiles]c1(cc(c(cc1)O)OC)C=O[/smiles][b][size=6]+[/size][/b][smiles]acetophenone[/smiles] [b][size=6]→[/size][/b] [smiles]c1(cc(c(cc1)O)OC)/C=C/C(c2ccccc2)=O[/smiles]

Note: For the HTML codes of various arrows useful in chemical equations, see the list at [url=http://character-code.com/arrows-html-codes.php]http://character-code.com/arrows-html-codes.php[/url]

Nicodem - 14-7-2014 at 01:24

...resulting in...

+

Note: For the HTML codes of various arrows useful in chemical equations, see the list at http://character-code.com/arrows-html-codes.php

TheChemiKid - 14-7-2014 at 04:35

Test for vanillin:


EDIT: Hmm, the wiki picture looks like this, so it confused me. Sorry for the double post.
100px-Vanillin2.svg.png - 4kB

[Edited on 7-14-2014 by TheChemiKid]

Chemosynthesis - 14-7-2014 at 11:36

Quote: Originally posted by Nicodem  

Chemosynthesis, the molecules I posted above are relatively larger than anything we usually discuss on this forum. Check the "normal sized" compound like the ones that Polverone posted above. They should be fine unless you have a small monitor or screen resolution settings.

Checked on a bigger screen and they look good. I tend to forget that I often use small screens for space, or my phone.
Awesome work again!

arkoma - 14-7-2014 at 11:53

This is wonderful--Kudos!



Code:
one of my favorite anthocyanidins, and all I had to enter was [smiles]petunidin[/smiles] Great job

The Volatile Chemist - 14-7-2014 at 13:17

Cheers! This is a great blessing! Thanks for the work put into it, I've always loved SMILES!

The Volatile Chemist - 20-7-2014 at 07:42

If you don't mind, I'm gonna test it here...


[Edited on 7-20-2014 by The Volatile Chemist]
That second one's Cephalostatin-1
It really doesn't like it... I think it has something to do with the '%' character for binding when the binding number is > 9, but I don't know... According to wikipedia, it's smiles should be
Code:
C[C@@](C)(O1)C[C@@H](O)[C@@]1(O2)[C@@H](C)[C@@H]3CC=C4[C@]3(C2)C(=O)C[C@H]5[C@H]4CC[C@@H](C6)[C@]5(C)Cc(n7)c6nc(C[C@@]89(C))c7C[C@@H]8CC[C@@H]%10[C@@H]9C[C@@H](O)[C@@]%11(C)C%10=C[C@H](O%12)[C@]%11(O)[C@H](C)[C@]%12(O%13)[C@H](O)C[C@@]%13(C)CO

[Edited on 7-20-2014 by The Volatile Chemist]

[Edited on 7-20-2014 by The Volatile Chemist]