Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
Author: Subject: Looking for ideas for a new Synthesis Open Access Platform
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 9-2-2024 at 11:21
Looking for ideas for a new Synthesis Open Access Platform


Hey everyone!

I am seeking ideas for a potential new open access platform for synthesis and related resources that could provide the ability to capture syntheses from journals, researchers, individual contributors, and provide cross-referencing between reactions, reagents, products, etc. to provide a searchable database. I would eventually like to have educational content in the form of a collaboratively developed chemistry text book.

Right now, it seems the easiest way to create a repository of chemicals would be by ingesting data from ChemSpider, and other data sources through APIs into a structured format using Wikibase, and bringing in Chemical Data Cartridges, such as Bingo, OpenBabel, etc., to allow interpreting the compounds by structure, functional group. This would allow for easy search, comparison, and navigation between them, and to begin building an ontology.

Integration of visual chemical editors like Ketcher, ChemDoodle, MarvinJS, would allow for the visual construction of compounds and reaction sequences that can be interpreted and further processed for correlation with other data in the repository, as well as using them to allow visual search for chemicals based on exact structure, substructure, and similiar structures. This is similar to other systems currently in use.

Also, the ability to ingest PDFs from journals and other data sources, run OCR on it to extract text, as well as use IMAGO OCR to extract and identify chemical structures from drawings, and then further enrich the repository with this information. There could also be the conversion of the PDFs into wiki pages, or something along those lines, or integrate a PDF viewer to view the original PDF, which might be problematic with copyright issues. Once these PDFs have been ingested you would be able to either navigate from chemical to all data sources, or from an article with the mention of the chemical to its properties, reactions, alternatives, synthesis, articles, suppliers, etc.

There is a lot that could be done here, and would be a lot of work, however, I have been thinking about this for the last several weeks and really like the idea.

The overall idea is to ingest or allow the development of chemical and synthesis content that could then be processed and linked with the other content in the repository. If you are looking at a synthesis, and need a specific reagent, you could navigate from that synthesis to the reagent to find its preparation, or possibly, alternatives based on similar reagents and reaction products.

What other features do you think would be useful, or just comments in general.

[Edited on 9-2-2024 by Loptr]




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
Sulaiman
International Hazard
*****




Posts: 3558
Registered: 8-2-2015
Location: 3rd rock from the sun
Member Is Offline


[*] posted on 9-2-2024 at 12:34


Might be 'better' to add to existing repositories such as orgsyn.org?



CAUTION : Hobby Chemist, not Professional or even Amateur
View user's profile View All Posts By User
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 9-2-2024 at 13:33


Quote: Originally posted by Sulaiman  
Might be 'better' to add to existing repositories such as orgsyn.org?


How do you suggest I go about that? It's managed by an organization and grants.

Also, how would you go about contributing to it? I highly doubt they would accept anything that you submit. It's a repository of validated synthesis by professional chemists.

I was hoping to also extend this to the amateur community.




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
j_sum1
Administrator
********




Posts: 6225
Registered: 4-10-2014
Location: Unmoved
Member Is Offline

Mood: Organised

[*] posted on 9-2-2024 at 14:56


Sounds like you want to reproduce prepchem but add a bot that automatically harvests synths from journals and reliable sources.

I don't know how you would do this. Seems like a significant coding challenge. And distinguishing between reliable/reproducable and bonkers-conjecture will be quite a feat to pull off.
View user's profile View All Posts By User
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 9-2-2024 at 16:00


Quote: Originally posted by j_sum1  
Sounds like you want to reproduce prepchem but add a bot that automatically harvests synths from journals and reliable sources.

I don't know how you would do this. Seems like a significant coding challenge. And distinguishing between reliable/reproducable and bonkers-conjecture will be quite a feat to pull off.


I am trying to take what's in my head and capture it in words. I will take some time tonight and try to lay it out better.

It wouldn't be duplicating prepchem. That is not a community. This would hopefully grow a community around it with collaborative features with accounts and roles, and allow the community to curate the information, and as information is added to it, the community could tag various things in it to allow interlinking the content.

For instance, say you have sodium ferrocyanide and want to know what reactions you could use it for, you would be able to click on potassium ferrocyanide and find every reaction in the repository where it has been tagged.

The other features I mentioned earlier would be down the road.




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
Texium
Administrator
********




Posts: 4508
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline

Mood: PhD candidate!

[*] posted on 9-2-2024 at 17:55


It sounds like you want to make SciFinder without the paywall



Come check out the Official Sciencemadness Wiki
They're not really active right now, but here's my YouTube channel and my blog.
View user's profile Visit user's homepage View All Posts By User
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 9-2-2024 at 19:49


Quote: Originally posted by Texium  
It sounds like you want to make SciFinder without the paywall


I have never had access to SciFinder, so can't say for sure. :P

@Texium, what are some of the features of SciFinder? What can it do?

[Edited on 10-2-2024 by Loptr]




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
Texium
Administrator
********




Posts: 4508
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline

Mood: PhD candidate!

[*] posted on 9-2-2024 at 21:31


It does pretty much everything you described in the OP!
Quote: Originally posted by Loptr  
I am seeking ideas for a potential new open access platform for synthesis and related resources that could provide the ability to capture syntheses from journals, researchers, individual contributors, and provide cross-referencing between reactions, reagents, products, etc. to provide a searchable database.
It does this with patents and journals automatically. Pretty much as soon as a paper is published you can find it and all the reactions that it contains, including from the supplementary information.

Quote: Originally posted by Loptr  
Integration of visual chemical editors like Ketcher, ChemDoodle, MarvinJS, would allow for the visual construction of compounds and reaction sequences that can be interpreted and further processed for correlation with other data in the repository, as well as using them to allow visual search for chemicals based on exact structure, substructure, and similiar structures. This is similar to other systems currently in use.
It has this functionality as well. You can search for a structure and it will pull up every reported reaction that it is found in. You can further limit it to reactant or product, and apply a myriad of other filters to find relevant results. You can also draw out a whole reaction scheme and search for any matches, and use variables in the structure drawings to get more general results. Substructure and similarity are options as well. Plus, it can also list suppliers that sell the chemicals you search for and the quantity and price they sell it in, though that isn’t as useful for home chemistry since most of them don’t sell to individuals.

Quote: Originally posted by Loptr  
Also, the ability to ingest PDFs from journals and other data sources, run OCR on it to extract text, as well as use IMAGO OCR to extract and identify chemical structures from drawings, and then further enrich the repository with this information... Once these PDFs have been ingested you would be able to either navigate from chemical to all data sources, or from an article with the mention of the chemical to its properties, reactions, alternatives, synthesis, articles, suppliers, etc.
I don’t know if it’s the same mechanism that you describe, but it does this too! When you view reaction search results, it’s clear that the conditions were automatically scraped from the publications, including their SIs, almost always quite accurately. When you view an article you have the option to “get reactions” and see all the schemes from the paper. Likewise, when you view a chemical, you can “get reactions” or “get references.” It’ll also directly provide you with spectroscopic data and physical properties of compounds if they are published.

Quote: Originally posted by Loptr  
The overall idea is to ingest or allow the development of chemical and synthesis content that could then be processed and linked with the other content in the repository. If you are looking at a synthesis, and need a specific reagent, you could navigate from that synthesis to the reagent to find its preparation, or possibly, alternatives based on similar reagents and reaction products.
Yeah, that is exactly what it is. Honestly, it’s such a powerful tool that I’ve been spoiled to have access to the last several years. It’s going to be hard to go back to not having it, so I would certainly support an endeavor to create an open-access alternative, though it would indeed be a colossal undertaking.



Come check out the Official Sciencemadness Wiki
They're not really active right now, but here's my YouTube channel and my blog.
View user's profile Visit user's homepage View All Posts By User
bnull
Hazard to Others
***




Posts: 141
Registered: 15-1-2024
Location: Between the Atlantic and the Pacific Ocean
Member Is Offline

Mood: Feeling like a pinky-bending bluesman.

[*] posted on 10-2-2024 at 05:15


Quote: Originally posted by Texium  
Yeah, that is exactly what it is. Honestly, it’s such a powerful tool that I’ve been spoiled to have access to the last several years. It’s going to be hard to go back to not having it, so I would certainly support an endeavor to create an open-access alternative, though it would indeed be a colossal undertaking.


You lucky bastard... I got a glimpse of it the other day. Dammit. An open-access version would be amazing.

@Loptr, why don't you try contacting researchers from the universities closest to you? I think that if you discuss with them, they'll offer advice and suggestions. It would be as useful to them as much as to any amateur chemist. If there were a free alternative almost as powerful, they would gladly ditch Scifinder the way some libraries did to those expensively useless journal subscriptions.

Even so, it would take at least a couple of years to make it run smoothly, and you can't do it alone.




Quod scripsi, scripsi.

B. N. Ull
View user's profile View All Posts By User
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 10-2-2024 at 08:32


Quote: Originally posted by bnull  
Quote: Originally posted by Texium  
Yeah, that is exactly what it is. Honestly, it’s such a powerful tool that I’ve been spoiled to have access to the last several years. It’s going to be hard to go back to not having it, so I would certainly support an endeavor to create an open-access alternative, though it would indeed be a colossal undertaking.


You lucky bastard... I got a glimpse of it the other day. Dammit. An open-access version would be amazing.

@Loptr, why don't you try contacting researchers from the universities closest to you? I think that if you discuss with them, they'll offer advice and suggestions. It would be as useful to them as much as to any amateur chemist. If there were a free alternative almost as powerful, they would gladly ditch Scifinder the way some libraries did to those expensively useless journal subscriptions.

Even so, it would take at least a couple of years to make it run smoothly, and you can't do it alone.


Yeah, I am well aware of that. I run a software development organization that contracts, and has commercial and IR&D projects as well.

I was thinking about starting small and trying to use as much open source software available as possible. There is quite a bit from what I can find.

I was reaching out to you all to see what else could be put on the wish list because I was mostly focusing on the technology, rather than the use case of what it would actually do. Most of my posts have been general statements about interlinking content and ingestion because that's what I am most familiar with professionally, and had an idea, but was trying to understand what was already being done by the other existing systems.

I think a combination of Wikibase, ingestion from Chemspider to create a listing of a bunch of reagents, along with other APIs to get additional details for each chemical would be a good first start. That way you have pages that can be linked to within reactions for reference. From there adding a visual editor with the ability to import reagents from the repository, and from there you have the beginnings of a collaborative platform that could allow individuals to contribute reactions and the platform be able to understand (somewhat) the reactant, solvents, etc. for linkage. Allow some members to then be able to curate the content using annotations and notices for references needed, or bogus content, and then you have the basis for the Wikipedia business model.

The article ingestion can be done but would require finetuning and experimentation, with constant adjustment as article formats change. The idea would be to extract text, determine paragraphs and their possible titles, extract images with their relation to the paragraphs, and then convert it into wikitext with a basic formatting. You don't have to duplicate the article exactly. Then once the content had been ingested, you process it for named entity resolution to other content within the repository for linkage.

Sounds easy. right? ;)




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
digga
Harmless
*




Posts: 39
Registered: 11-6-2018
Member Is Offline


[*] posted on 10-2-2024 at 08:54


Allow to user to maintain a stock list. The show the user what can be made from it and what skills/equipment are needed. Assign each reaction keywords for searching. Allow users to add reactions to the list. For example:

Benzenesulfonic Acid. Materials: purified toluene, concentrated sulfuric acid, bicarbonate. Gear: dean stark apparatus Skills: distillation, refluxing, filtration. Keywords: moderate catalyst useful.

Show reaction if stock list contains the ingredients AND the user wants easy to moderate.

Have a flag which shows reactions you are missing some of the requirements highlighting what you are missing.


Forum members can add reactions.

This is a small project which will provide immediate reward by helping suggest projects.

Howzaboutit?

[Edited on 10-2-2024 by digga]
View user's profile View All Posts By User
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 10-2-2024 at 16:21


Assuming that a given reaction has been captured with the list of chemicals needed, then yes, it would be possible to somewhat easily identify products that you could possibly produce. At that point its just a matter of finding the complete intersection of your stock, and reactants for a given reaction. It could also list the of possible reactions given that a required reactant was acquired in a sorted order by distance based on the number needed additional reactants.

Would that be considered a generally beneficial feature? It sounds like it would be great for amateur chemists, but not something that would greatly benefit academics or industry, which isn't necessarily a requirement. Maybe part of an exploratory mode.




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
clearly_not_atara
International Hazard
*****




Posts: 2693
Registered: 3-11-2013
Member Is Offline

Mood: Big

[*] posted on 10-2-2024 at 18:23


I think I would avoid trying to be all things to all people. One thing I've thought would be useful would be a repository of published reactions (papers or patents) that have been successfully replicated by amateurs.



[Edited on 04-20-1969 by clearly_not_atara]
View user's profile View All Posts By User
Loptr
International Hazard
*****




Posts: 1347
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 10-2-2024 at 18:54


Quote: Originally posted by clearly_not_atara  
I think I would avoid trying to be all things to all people. One thing I've thought would be useful would be a repository of published reactions (papers or patents) that have been successfully replicated by amateurs.


That's a very wise point.

If you try to be good at everything, you will end up not being good at anything.

I am mostly just talking through ideas with the community at this point. All suggestions welcome.




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User

  Go To Top