Posted by: woodforthetrees | March 30, 2010

How to build a bad biological database

Storing data is a simple task isn’t it? Memory is relatively cheap and after all data wants to be free doesn’t it? How hard can it be? Here are my ten tips for building a terrible database.

TIP ONE            Make submission difficult
Scientists are smart people so there’s no need to bother wasting time and money on usability issues. Eventually they’ll figure out how to get it right and at least some of the important information will be submitted. Who cares if everyone submits to a rival database, just because it’s easier to use?

TIP TWO          Have a support service that is available 9-5 Mon to Fri GMT
After all scientists are renown for working 9-5 and the only science that matters is in Europe… isn’t it?

TIP THREE     Don’t let your file formats interconvert
Under no circumstances should your data from one piece of equipment in a specific file format be converted into a common and searchable one, or even be read without proprietary software. Particularly, ignore the pioneering work of the open microscopy environment: format standardisation is for wimps.

TIP FOUR         Keep your database independent
Stand out from the crowd by ensuring your data do not link to other databases. Who wants their data to be found via a sequence search on GenBank or through links from UniProt? Data wants to be free but it doesn’t necessarily want to be found.

TIP FIVE           Totally trust your automated systems
Books can be ordered on Amazon without any manual intervention so why would it be needed on a database? Most of the well known biological databases have curators who check the submissions, ensuring that they are complete and accurate as far as possible. What a waste of money – nobody minds incomplete data sets, missing experimental conditions etc.

TIP SIX               Do not provide a permanent, unique identifier
The PDB uses identities (e.g. 1ubq) and the Gene Expression Ominibus uses an accession number – as do many other databases, but this looks like another hassle you don’t need. We all need a good place to bury bad data.

TIP SEVEN        Make sure reviewers can’t see raw data
Don’t devise a simple way for journal reviewers to check data that is part of paper going through peer review. Reviewers LOVE to receive emails with thousands of huge images attached.

TIP EIGHT        Include a 44-page getting started guide
Scientist have lots of spare time and are very keen to read through a 44-page quick-start guide to your database because you’ve followed tip one and ensured that the database is very difficult to use. Even better, provide at least a 50-page guide for reviewers. The only people less busy than your submitters are the reviewers. It’s a well known fact.

TIP NINE          If you include a search option, make sure it only works in UK English
Or in US English, but certainly not in both. People foolish enough to search for crystallization and not crystallisation don’t deserve to find anything in a database.

TIP TEN             Do not develop good visualisation tools
Scientists love data. Pages and pages and pages of it. Making it simple to see connections between different datasets would just make it too simple. Scientist love a challenge.


Responses

  1. […] Hodges has a fantastic post about building (bad) biological databases, a must read. The only point I might have a little nit about is Tip #5, Totally trust your […]

  2. This is great. I’m going to make it a required reading for all submitters to the next NAR Database Issue
    (http://www.oxfordjournals.org/nar/database/a)

  3. #11: Publish a paper about your database in a high profile journal, then actually make it available 2 years later.

  4. #12: Take the server down for routine maintenance and don’t post a notice explaining this; instead have a Forbidden Error 403 sign.

    #13: Don’t back up the data before switching the server off.

    Oh, how I wish these weren’t true.

  5. #14 Allow raw dumps of the database, but only in PDF-formatted tables.

  6. Extend this beyong the biological databases and you get a bad bioinformatician:

    http://manuelcorpas.com/2009/01/29/138/

  7. #15 Do not follow the example of INTEGRALL database.
    #16 Claim for authorship each time your database is used

  8. […] attempted to let people know how to make databases more interoperable and discoverable, but this blog takes a very different take on the idea. The ideas brought forward include making data silos, […]


Leave a comment

Categories