@sagrudd

One of the highlights of my academic career was a simple comparative genomics platform called initially Sputnik and later openSputnik. The software was used to massage a collection of Sanger-based cDNA sequences through a computational pipeline to prepare unigenes and to annotate these unigenes for the preparation of summary statistics, reports and to facilitate comparative genomics in the absence of a complete genome sequence.

The openSputnik software died when I moved into the pharmaceutical industry in 2006 and was actively discouraged from continuing my academic pursuits. I am not aware of copies of the software being available anywhere - it would I suspect be rather embarrassing to see the code that was hacked together to facilitate publications rather than support…

In the desire to be a little more active in my coding ambitions I have made the decision to bring the project back to life but from a third-generation DNA sequencing perspective. A quick survey of bioinformatics publications shows that there is an abundance of quality workflow software for cDNA sequence analysis but most are for quantitative analysis. I see a gap in the bioinformatics universe where something akin to the earlier openSputnik concept could live; not reinventing wheels but now further building on the technologies such as NextFlow (or Snakemake). As a vehement advocate of R - we’ll do the data science coding in R where possible.

I have created a GitHub project for the new Kwangmyŏngsŏng-3 Unit 2 project. I am not working to any timelines - just a background exercise to keep me engaged through this current COVID-lockdown. Please, watch this space!

Image attribution - the gwangmyeongseong3 image shown above was sourced from this page. The inclusion of this image and link is not an endorsement of any commentary - I love the lodestar name for the satellite.

Kwangmyŏngsŏng-3; redux from an earlier comparative genomics toolbox