How to build a bad biological database

Posted by: woodforthetrees | March 30, 2010

How to build a bad biological database

Storing data is a simple task isn’t it? Memory is relatively cheap and after all data wants to be free doesn’t it? How hard can it be? Here are my ten tips for building a terrible database.

TIP ONE Make submission difficult
Scientists are smart people so there’s no need to bother wasting time and money on usability issues. Eventually they’ll figure out how to get it right and at least some of the important information will be submitted. Who cares if everyone submits to a rival database, just because it’s easier to use?

TIP TWO Have a support service that is available 9-5 Mon to Fri GMT
After all scientists are renown for working 9-5 and the only science that matters is in Europe… isn’t it?

TIP THREE Don’t let your file formats interconvert
Under no circumstances should your data from one piece of equipment in a specific file format be converted into a common and searchable one, or even be read without proprietary software. Particularly, ignore the pioneering work of the open microscopy environment: format standardisation is for wimps.

TIP FOUR Keep your database independent
Stand out from the crowd by ensuring your data do not link to other databases. Who wants their data to be found via a sequence search on GenBank or through links from UniProt? Data wants to be free but it doesn’t necessarily want to be found.

TIP FIVE Totally trust your automated systems
Books can be ordered on Amazon without any manual intervention so why would it be needed on a database? Most of the well known biological databases have curators who check the submissions, ensuring that they are complete and accurate as far as possible. What a waste of money – nobody minds incomplete data sets, missing experimental conditions etc.

TIP SIX Do not provide a permanent, unique identifier
The PDB uses identities (e.g. 1ubq) and the Gene Expression Ominibus uses an accession number – as do many other databases, but this looks like another hassle you don’t need. We all need a good place to bury bad data.

TIP SEVEN Make sure reviewers can’t see raw data
Don’t devise a simple way for journal reviewers to check data that is part of paper going through peer review. Reviewers LOVE to receive emails with thousands of huge images attached.

TIP EIGHT Include a 44-page getting started guide
Scientist have lots of spare time and are very keen to read through a 44-page quick-start guide to your database because you’ve followed tip one and ensured that the database is very difficult to use. Even better, provide at least a 50-page guide for reviewers. The only people less busy than your submitters are the reviewers. It’s a well known fact.

TIP NINE If you include a search option, make sure it only works in UK English
Or in US English, but certainly not in both. People foolish enough to search for crystallization and not crystallisation don’t deserve to find anything in a database.

TIP TEN Do not develop good visualisation tools
Scientists love data. Pages and pages and pages of it. Making it simple to see connections between different datasets would just make it too simple. Scientist love a challenge.

Posted in database, Funding, PDB, Systems biology, Web 2.0 | Tags: best, bioinformatics, biology, database, worst

Responses

[…] Hodges has a fantastic post about building (bad) biological databases, a must read. The only point I might have a little nit about is Tip #5, Totally trust your […]
By: How not to build databases for biology on April 14, 2010
at 2:43 am

Reply
This is great. I’m going to make it a required reading for all submitters to the next NAR Database Issue
(http://www.oxfordjournals.org/nar/database/a)
By: Michael Galperin on April 14, 2010
at 5:52 pm

Reply
#11: Publish a paper about your database in a high profile journal, then actually make it available 2 years later.
By: dude on April 14, 2010
at 7:05 pm

Reply
#12: Take the server down for routine maintenance and don’t post a notice explaining this; instead have a Forbidden Error 403 sign.

#13: Don’t back up the data before switching the server off.

Oh, how I wish these weren’t true.
By: woodforthetrees on April 14, 2010
at 8:22 pm

Reply
#14 Allow raw dumps of the database, but only in PDF-formatted tables.
By: Benjamin Berman on April 14, 2010
at 9:28 pm

Reply
Extend this beyong the biological databases and you get a bad bioinformatician:

http://manuelcorpas.com/2009/01/29/138/
By: Karyn M. on April 16, 2010
at 3:55 pm

Reply
#15 Do not follow the example of INTEGRALL database.
#16 Claim for authorship each time your database is used
By: zdona2002@yahoo.fr on September 6, 2012
at 4:18 pm

Reply
[…] attempted to let people know how to make databases more interoperable and discoverable, but this blog takes a very different take on the idea. The ideas brought forward include making data silos, […]
By: NIF Blog » Blog Archive » How to make the most annoying biological database on November 4, 2012
at 7:00 pm

Reply

Wood for the trees