One of the highlights of my academic career was a simple comparative genomics platform called initially Sputnik and later openSputnik. The software was used to massage a collection of Sanger-based cDNA sequences through a computational pipeline to prepare unigenes and to annotate these unigenes for the preparation of summary statistics, reports and to facilitate comparative genomics in the absence of a complete genome sequence.
The openSputnik software died when I moved into the pharmaceutical industry in 2006 and was actively discouraged from continuing my academic pursuits. I am not aware of copies of the software being available anywhere - it would I suspect be rather embarrassing to see the code that was hacked together to facilitate publications rather than support…
In the desire to be a little more active in my coding ambitions I have made the
decision to bring the project back to life but from a third-generation DNA
sequencing perspective. A quick survey of bioinformatics publications shows that
there is an abundance of quality workflow software for cDNA sequence analysis
but most are for quantitative analysis. I see a gap in the bioinformatics
universe where something akin to the earlier openSputnik concept could live;
not reinventing wheels but now further building on the technologies such as
NextFlow (or Snakemake). As a vehement advocate of R
- we’ll do the data
science coding in R where possible.
I have created a GitHub project for the new Kwangmyŏngsŏng-3 Unit 2 project. I am not working to any timelines - just a background exercise to keep me engaged through this current COVID-lockdown. Please, watch this space!
Image attribution - the gwangmyeongseong3 image shown above was sourced from
this page.
The inclusion of this image and link is not an endorsement of any commentary -
I love the lodestar
name for the satellite.