Explore Viral Databases with VirionDB

Earlier this year, the BioNano team of Autodesk Research began a new project, VirionDB. VirionDB is an open-source (GitHub) portal which aggregates information from several public databases, accessible to newcomers and experts alike, allowing users to survey the thousands of sequenced viruses.

Why Another Repository?

There are several existing databases which list viruses and information about them — NCBI, PDB, ViralZone, Virus-Host DB — and we gather information from many of them, so why make another database with all the same information?

We were looking for potential viruses to use as scaffolds for DNA origami. This query effectively reduced to looking for single-stranded DNA (generally, Type II in Baltimore classification) viruses on the order of 10,000 bases long.

In existing databases, it is difficult to visualize metrics across all viruses, to compare viruses, or to understand the context of a particular virus. This is true of most repositories in the life sciences, not just of those for viruses.

What Makes a Good Repository?

Fundamentally, whether a repository is useful and successful is governed by social and fiscal aspects, not technical ones. This assessment can be reduced to a few core principles:

It is provides the right information, for all users, accessibly.
It is trusted, recognized, correct, and complete.
It is updated, maintainable, active, and adaptive.
It is connected and connectable.
It is funded and stable.

As with standards, there is a temptation to build new repositories to address unmet needs, even in already crowded spaces. Some databases in virology, like the widely-trusted NCBI, have existed for many years, and it would cause fragmentation to add another.

However, there remains the difficulty in quickly filtering and navigating through large, flat sets of data. Fortunately, we don’t have to make our own database to solve this problem.

VirionDB Isn’t a Repository.

VirionDB dynamically aggregates data from existing repositories, and does not maintain its own. It hopes to simply add to the virology tool ecosystem without leading to further fragmentation.

Querying for single-stranded DNA viruses around 10,000 bases long in VirionDB

VirionDB makes it easy to filter data with text or characteristics of viruses. Categorical information is captured in pie charts and can be filtered using checklists, while continuous measurements are captured in line graphs are filtered through sliders. The relatively small size of the database (~7000 viruses) lets us run all querying and filtering in the web application, so moving through the data is snappy.

Users can click into a virus to see deeper explanations about its characteristics, and sets of viruses can be compared side-by-side.

Viruses can be compared side-by-side, with details of each trait

Because VirionDB does not host its own data, when users wish to contribute feedback or corrections, the software links back to the appropriate databases. Developers who wish to filter our data programmatically can download a JSON file with our aggregated data.

What’s Next?

We still have some work to do to make it easy for users to contribute feedback and report errors to our data sources. There are some challenges in merging datasets from the existing databases, as many of them do not link to each other, and it is sometimes unclear what technique would appropriately merge the viruses from each, and all of their traits.

Finally, there are a few repositories with which VirionDB is not integrated, and we’d love to work with them to incorporate their information.

