Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly.
Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users.
In the version we're launching today, you can upload tabular data sets (right now, we're supporting up to 100 MB per data set, 250 MB of data per user) and share them with your collaborators or with the world. You can choose to share all of your data with your collaborators, or keep parts of it hidden. You can even share different portions of your data with different collaborators.
When you edit the data in place, your collaborators always get the latest version. The attribution feature means your data will get credit for its contribution to any data set built with it. And yes, you can export your data back out of the cloud as CSV files.
Want to understand your data better? You can filter and aggregate the data, and you can visualize it on Google Maps or with other visualizations from the Google Visualization API. In this example, an intensity map of the world shows countries that won more than 10 gold medals in the Summer Olympics. You can then embed these visualizations in other properties on the Web (e.g., blogs and discussion groups) by simply pasting some HTML code we provide you.
The power of data is truly harnessed when you combine data from multiple sources. For example, consider combining data about access to fresh water in various countries with data about malaria rates in those countries, or as shown here, showing three sources of GDP data side by side. Fusion Tables enables you to fuse multiple sets of data when they are about the same entities. In database speak, we call this a join on a primary key but the data originates from multiple independent sources. This is just the start, more join capabilities will come soon.
But Fusion Tables doesn't require you and your collaborators to stop there. What if you don't agree on all of the values? Or need to understand the assumptions behind the data better? Fusion Tables enables you to discuss data at different granularity levels -- you can discuss individual rows or columns or even individual cells. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.
We hope you find Fusion Tables useful. As usual with first releases, we realize there is much missing, and we look forward to hearing your feedback.
35 comments:
and we get one step closer to Englebart's vision...!
The opportunities to integrate vast amounts of data, maintain the integrity/sourcing and provide a space for collaboration -- even visual integration -- are immense. Thank you! Also, we're delighted that you've chosen fresh water as an example. Over at Circle of Blue (http://www.circleofblue.org), we'll be pulling in the many dormant data tables that have been seeking this new life, and we'll be looking for the obvious and not-so-obvious relationships that will lead to better journalism, science and decision-making regarding global water issues, from water scarcity to disease to infrastructure. We're honored to have been a part, albeit extremely small, in this huge project.
Ooh, this is going to be fun. Nice work!
Love the concept, but who is the target audience? People or application?
Very exciting! Keep up the good work!
Can't wait to try it out and see how we can implement it for local government.
Nice feature! More controllable features on the plotting will be cooler! Well Done!
Very interesting. I'd be interested to see how much data you can manipulate in the browser before it blows up.
Sorting and aggregating are where data tools shows their strengths (or lack of) so it'll be interesting to see what you can do there.
I made a table, why is it so difficult to make edited cell values stick?
http://tables.googlelabs.com/DataSource?dsrcid=13891/13891
Interesting:~)
cool direction, but theres still a long way to go.. :)
This sounds like nothing more than database automation. You take the basic concepts of ORM, create functionality to automatically find your joins (not hard if you name your foreign key the same thing as the primary key for the table you are joining too) and plug-in additional functionality on the backend like notes which is nothing more than an addtional table.
Not really revolutionary but if it gets popular, it WILL kill 'Access'.
I tried to import a UTF8 file and it seems the Fusion Tables don't support non-ASCII data in CVS format. Import from Google Docs worked well.
wish you'd be able to drill down from high level down to low, especially when using visualizations...
Very nice.
This will be useful in so many ways!
I've posted a comparison of Fusion Tables and Linked Data on Go To Hellman
Is it FOSS?
Just read Eric's comment on his blog about the social aspects of Fusion and I think I get it now. Fusion isn't a REPLACEMENT for databases (though people are pitching it this way), it is a very powerful way of extending databases to add community driven development to them.
In the past, only the DBA (and maybe a couple other people) had the ability to make the changes and this was a good thing because people didn't understand the database. But on the other hand, the DBA doesn't always understand the business logic behind the DB whereas the people who use the data do.
This gives the people using the data a better way to view and make contributions which can be reviewed.
If I understand this properly, this can have a very positive effect as an open source project. Unfortunately, it only exists as a cloud driven database right now and not a separate open source project for databases.
I think I see potential for an open source project. :)
thanks, that useful article
very eager to try this, as a database newbie - I was just thinking it would be great to have a searchable glossary of terms for my site...
thank you!
@Xeno: and the time dimension? That's certainly not trivial.
@Google: please do not abandon the basic principle that's so crucial here. Do no evil. I am fairly certain that this technology along with other collaborative technologies (HTML 5.0) are about to completely transform the world. It is absolutely necessary that we fight for an open, free, protected online environment instead of one that could be controlled by any sort of oligarchy. I commend you for this work, once again you have caught me by surprise. And remember, the principles of an open, free internet are as important as the technology behind it.
It appears that you are solving problems of:
1. Scalable joins between ad hoc data
2. Data consistency
3. Visualization of log files (seeing other people’s changes)
4. Accessibility to users without knowledge of SQL
My company ScaleDB is addressing problems #1 & #2 (the hard ones) through an advanced index model that facilitates ad hoc joins and a shared-disk database model that provides multiple nodes with access to shared data, to eliminate the data consistency issue.
--Mike
End users should be allowed to sort, based on the filter options we set for them.
Example:
ZIP Locator App
Search pet stores in 10 miles of my current zip
I enter all known pet store addresses
Would it be possible to increase the limit? I published the usaspending.gov data on an Amazon Public Dataset, and would like to load that data here as well, but it is close to 200GB.
Didn't Dabble DB (http://dabbledb.com/) do something similar ... only they missed the collaboration boat.
Would like to see an easier way to do complex aggregation and merging though ...
Can't wait for the next set of changes :)
It is good to see that Google is finally entering the Data Visualization and Infographics space with Google Fusion. Data visualization is currently not available in the Google Apps platform. There are some 3rd party Google Docs gadgets for creating TreeMaps, Timelines, Pivot Tables, Maps and such, but nothing that is officially supported by Google.
There are many players in the consumer data visualization space – IBM Many Eyes, Chartle, Timetric, dabbledb, Microsoft Research Excel add-ins and others, but IBM Many Eyes is the best thus far. You can create amazing visualization with Many Eyes.
In time, I am sure Google will be a strong contender in this technology space, but they have a long way to go.
Here is a nice article with map visualizations and a quick Fusion Table how-to tutorial: http://www.circleofblue.org/waternews/2009/world/google-brings-water-data-to-life/.
Is there an ODBC driver planned?
Thanks!!
I'd test the db with spatial data.
Very interesting. One question - Is there a way to do a dual Y-axis graphs? That would be really helpful for some of the performance engineering graphs.
Thanks again for this great tool!
I teach a database applications course at Marian University in Indianapolis. Fusion Tables are a great idea. One of the other commenter’s had it right--DBA’s know the technology but End Users know the business rules and what they want from their data. In today’s business environment, End Users increasingly have the technology skills as well. They also have to integrate information from many sources to get the answers they want. My only concern has to do with security of data in the “cloud.” This Fall, I will be using Fusion Tables in my class.
interesting,nice and huge work! ;)
I second Herm's question...
Great tool for social science related research as well as for displaying data more visually,
http://irvillage.com/magazine/read/google-fusion-tables_9.html
Question for anyone who may know. Does Google Fusion Table service enable data updates by registered users (of the site), or is it static to all except the owner / administrator. I want to create an online database that is searchable by all, updatable by those who are registered and have that privilege, and will generate reports after comparing data entered (for searching) against the info in the database.
@hallberg5646
Click on the "Share" button in the top right hand corner. It allow basic access control. It is not too granular, but does the job.
Saqib
Post a Comment