Tuesday, June 09, 2009
Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly.
Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users.
In the version we're launching today, you can upload tabular data sets (right now, we're supporting up to 100 MB per data set, 250 MB of data per user) and share them with your collaborators or with the world. You can choose to share all of your data with your collaborators, or keep parts of it hidden. You can even share different portions of your data with different collaborators.
When you edit the data in place, your collaborators always get the latest version. The attribution feature means your data will get credit for its contribution to any data set built with it. And yes, you can export your data back out of the cloud as CSV files.
Want to understand your data better? You can filter and aggregate the data, and you can visualize it on Google Maps or with other visualizations from the Google Visualization API. In this example, an intensity map of the world shows countries that won more than 10 gold medals in the Summer Olympics. You can then embed these visualizations in other properties on the Web (e.g., blogs and discussion groups) by simply pasting some HTML code we provide you.
The power of data is truly harnessed when you combine data from multiple sources. For example, consider combining data about access to fresh water in various countries with data about malaria rates in those countries, or as shown here, showing three sources of GDP data side by side. Fusion Tables enables you to fuse multiple sets of data when they are about the same entities. In database speak, we call this a join on a primary key but the data originates from multiple independent sources. This is just the start, more join capabilities will come soon.
But Fusion Tables doesn't require you and your collaborators to stop there. What if you don't agree on all of the values? Or need to understand the assumptions behind the data better? Fusion Tables enables you to discuss data at different granularity levels -- you can discuss individual rows or columns or even individual cells. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.
We hope you find Fusion Tables useful. As usual with first releases, we realize there is much missing, and we look forward to hearing your feedback.