Google Research Blog
The latest news from Research at Google
Announcing the Google Internet of Things (IoT) Technology Research Award Pilot
Wednesday, February 10, 2016
Posted Vint Cerf, Chief Internet Evangelist, and Max Senges, Google Research
Over the past year, Google engineers have experimented and developed a set of building blocks for the
Internet of Things
- an ecosystem of connected devices, services and “things” that promises direct and efficient support of one’s daily life. While there has been significant progress in this field, there remain significant challenges in terms of (1) interoperability and a standardized modular systems architecture, (2) privacy, security and user safety, as well as (3) how users interact with, manage and control an ensemble of devices in this connected environment.
It is in this context that we are happy to invite university researchers
to participate in the
Internet of Things (IoT) Technology Research Award Pilot
. This pilot provides selected researchers in-kind gifts of Google IoT related technologies (listed below), with the goal of fostering collaboration with the academic community on small-scale (~4-8 week) experiments, discovering what they can do with our software and devices.
We invite you to submit proposals in which Google IoT technologies are used to (1) explore interesting use cases and innovative user interfaces, (2) address technical challenges as well as interoperability between devices and applications, or (3) experiment with new approaches to privacy, safety and security. Proposed projects should make use of one or a combination of these Google technologies:
Google beacon platform
- consisting of the open beacon format Eddystone and various client and cloud APIs, this platform allows developers to mark up the world to make your apps and devices work smarter by providing timely, contextual information.
- based on the Eddystone URL beacon format, the Physical Web is an approach designed to allow any smart device to interact with real world objects - a vending machine, a poster, a toy, a bus stop, a rental car - and not have to download an app first.
Nearby Messages API
- a publish-subscribe API that lets you pass small binary payloads between internet-connected Android and iOS devices as well as with beacons registered with
Google's proximity beacon service
- Brillo is an Android-based embedded OS that brings the simplicity and speed of mobile software development to IoT hardware to make it cost-effective to build a secure smart device, and to keep it updated over time. Weave is an open communications and interoperability platform for IoT devices that allows for easy connections to networks, smartphones (both Android and iOS), mobile apps, cloud services, and other smart devices.
- a communication hub for the Internet of Things supporting Bluetooth® Smart Ready, 802.15.4 and 802.11a/b/g/n/ac. It also allows you to quickly create a guest network and control the devices you want to share (see
Google Cloud Platform IoT Solutions
- tools to scale connections, gather and make sense of data, and provide the reliable customer experiences that IoT hardware devices require.
- provides custom full screen apps for a purpose-built Chrome device, such as a guest registration desk, a library catalog station, or a point-of-sale system in a store.
- an open-source framework designed to make it easier to develop secure, multi-device user experiences, with or without an Internet connection.
Check out the
Ubiquity Dev Summit playlist
for more information on these platforms and their best practices.
Please submit your proposal here
by February 29th in order to be considered for a award. Proposals will be reviewed by researchers and product teams within Google. In addition to looking for impact and interesting ideas, priority will be given to research that can make immediate use of the available technologies. Selected proposals will be notified by the end of March 2016. If selected, the award will be subject to Google’s terms, and your use of Google technologies will be subject to the applicable Google terms of service.
To connect our physical world to the Internet is a broad and long-term challenge, one we hope to address by working with researchers across many disciplines and work practices. We are looking forward to the collaborative opportunity provided by this pilot, and learning about innovative applications you create for these new technologies.
The same eligibility conditions as for the Faculty Research Award Program apply -
Internet of Things
AlphaGo: Mastering the ancient game of Go with Machine Learning
Wednesday, January 27, 2016
Posted by David Silver and Demis Hassabis, Google DeepMind
Games are a great testing ground for developing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. Creating programs that are able to play games better than the best humans has a long history - the first classic game mastered by a computer was
noughts and crosses
(also known as tic-tac-toe) in 1952 as a PhD candidate’s project. Then fell
in 1994. Chess was tackled by
in 1997. The success isn’t limited to board games, either - IBM's
won first place on Jeopardy in 2011, and in
2014 our own algorithms learned to play dozens of Atari games
just from the
raw pixel inputs
But one game has thwarted A.I. research thus far: the ancient game of
. Invented in China over 2500 years ago, Go is played by more than
40 million people worldwide
. The rules are simple: players take turns to place black or white stones on a board, trying to capture the opponent's stones or surround empty space to make points of territory.
wrote about the game, and its aesthetic beauty elevated it to one of the f
our essential arts
required of any true Chinese scholar. The game is played primarily through intuition and feel, and because of its subtlety and intellectual depth it has captured the human imagination for centuries.
But as simple as the rules are, Go is a game of profound complexity. The search space in Go is vast -- more than a
times larger than chess (a number greater than there are atoms in the universe!). As a result, traditional “brute force” AI methods -- which construct a
over all possible sequences of moves -- don’t have a chance in Go. To date, computers have played Go only as well as amateurs. Experts predicted it would be at least another
until a computer could beat one of the world’s elite group of Go professionals.
We saw this as an irresistible challenge! We started building a system, AlphaGo, described in a paper in
this week, that would overcome these barriers. The key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two
deep neural networks
, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network”, predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network”, is then used to reduce the depth of the search tree -- estimating the winner in each position in place of searching all the way to the end of the game.
AlphaGo’s search algorithm is much more human-like than previous approaches. For example, when
played chess, it searched by brute force over thousands of times more positions than AlphaGo. Instead, AlphaGo looks ahead by playing out the remainder of the game in its imagination, many times over - a technique known as
Monte-Carlo tree search
. But unlike previous Monte-Carlo programs, AlphaGo uses deep neural networks to guide its search. During each simulated game, the policy network suggests intelligent moves to play, while the value network astutely evaluates the position that is reached. Finally, AlphaGo chooses the move that is most successful in simulation.
We first trained the policy network on 30 million moves from games played by human experts, until it could predict the human move 57% of the time (the previous record before AlphaGo was
). But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and gradually improving them using a trial-and-error process known as
. This approach led to much better policy networks, so strong in fact that the raw neural network (immediately, without any tree search at all) can defeat state-of-the-art Go programs that build enormous search trees.
These policy networks were in turn used to train the value networks, again by reinforcement learning from games of self-play. These value networks can evaluate any Go position and estimate the eventual winner - a problem so hard it was
believed to be impossible
Of course, all of this requires a huge amount of compute power, so we made extensive use of
Google Cloud Platform
, which enables researchers working on AI and Machine Learning to access elastic compute, storage and networking capacity on demand. In addition, new open source libraries for numerical computation using data flow graphs, such as
, allow researchers to efficiently deploy the computation needed for deep learning algorithms across multiple CPUs or GPUs.
So how strong is AlphaGo? To answer this question, we played a tournament between AlphaGo and the best of the rest - the top Go programs at the forefront of A.I. research. Using a single machine, AlphaGo won all but one of its 500 games against these programs. In fact, AlphaGo even beat those programs after giving them
4 free moves headstart
at the beginning of each game. A high-performance version of AlphaGo, distributed across many machines, was even stronger.
This figure from the Nature article shows the
of AlphaGo (both single machine and distributed versions), the European champion Fan Hui (a professional 2-dan), and the strongest other Go programs, evaluated over thousands of games. Pale pink bars show the performance of other programs when given a four move headstart.
It seemed that AlphaGo was ready for a greater challenge. So we invited the reigning 3-time European Go champion Fan Hui — an elite professional player who has devoted his life to Go since the age of 12 — to our London office for a challenge match. The match was played behind closed doors between October 5-9 last year. AlphaGo won by 5 games to 0 -- the first time a computer program has ever beaten a professional Go player.
AlphaGo’s next challenge will be to play the top Go player in the world over the last decade,
. The match will take place this March in Seoul, South Korea. Lee Sedol is excited to take on the challenge saying, "I am privileged to be the one to play, but I am confident that I can win." It should prove to be a fascinating contest!
We are thrilled to have mastered Go and thus achieved one of the
grand challenges of AI
. However, the most significant aspect of all this for us is that AlphaGo isn’t just an
built with hand-crafted rules, but instead uses general machine learning techniques to allow it to improve itself, just by watching and playing games. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently, ultimately we want to apply these techniques to important real-world problems. Because the methods we have used are general purpose, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis.
Teach Yourself Deep Learning with TensorFlow and Udacity
Thursday, January 21, 2016
Posted by Vincent Vanhoucke, Principal Research Scientist
has become one of the hottest topics in machine learning in recent years. With
, the deep learning platform that we
as an open-source project, our goal was to bring the capabilities of deep learning to everyone. So far, we are extremely excited by the uptake: more than 4000 users have forked it on
in just a few weeks, and the project has been starred more than 16000 times by
enthusiasts around the globe
To help make deep learning even more accessible to engineers and data scientists at large, we are launching a new
Deep Learning Course
developed in collaboration with
. This short, intensive course provides you with all the basic tools and vocabulary to get started with deep learning, and walks you through how to use it to address some of the most common machine learning problems. It is also accompanied by interactive TensorFlow notebooks that directly mirror and implement the concepts introduced in the lectures.
The course consists of four lectures which provide a tour of the main building blocks that are used to solve problems ranging from image recognition to text analysis. The first lecture focuses on the basics that will be familiar to those already versed in machine learning: setting up your data and experimental protocol, and training simple classification models. The second lecture builds on these fundamentals to explore how these simple models can be made deeper, and more powerful, and explores all the scalability problems that come with that, in particular regularization and hyperparameter tuning. The third lecture is all about
and image recognition. The fourth and final lecture explore models for text and sequences in general, with embeddings and
recurrent neural networks
. By the end of the course, you will have implemented and trained this variety of models on your own machine and will be ready to transfer that knowledge to solve your own problems!
Our overall goal in designing this course was to provide the machine learning enthusiast a rapid and direct path to solving real and interesting problems with deep learning techniques, and we're now very excited to share what we've built! It has been a lot of fun putting together with the fantastic team of experts in online course design and production at Udacity. For more details, see the
Udacity blog post
register for the course
. We hope you enjoy it!
Why attend USENIX Enigma?
Monday, January 11, 2016
Parisa Tabriz, Security Princess & Enigma Program Co-Chair
we announced USENIX Enigma
, a new conference intended to shine a light on great, thought-provoking research in security, privacy, and electronic crime. With Enigma beginning in just a few short weeks, I wanted to share a couple of the reasons I’m personally excited about this new conference.
Enigma aims to bridge the divide that exists between experts working in academia, industry, and public service, explicitly bringing researchers from different sectors together to share their work. Our speakers include those spearheading the defense of digital rights (
Electronic Frontier Foundation
), practitioners at a number of well known industry leaders (
), and researchers from multiple universities in the U.S. and abroad. With the diverse
session topics and organizations represented
, I expect interesting—and perhaps spirited—coffee break and lunchtime discussions among the equally diverse list of conference attendees.
Of course, I’m very proud to have some of my Google colleagues speaking at Enigma:
Adrienne Porter Felt will talk about blending research and engineering to solve usable security problems. You’ll hear how Chrome’s usable security team runs user studies and experiments to motivate engineering and design decisions. Adrienne will share the challenges they’ve faced when trying to adapt existing usable security research to practice, and give insight into how they’ve achieved successes.
Ben Hawkes will be speaking about
, a security research team dedicated to the mission of, “making
hard.” Ben will talk about why Project Zero exists, and some of the recent trends and technologies that make vulnerability discovery and exploitation fundamentally harder.
Kostya Serebryany will be presenting a 3-pronged approach to securing C++ code based on his many years of experiencing wrangling complex, buggy software. Kostya will survey multiple dynamic sanitizing tools him and his team have made publicly available, review control-flow and data-flow guided fuzzing, and explain a method to harden your code in the presence of any bugs that remain.
Elie Bursztein will go through key lessons the Gmail team learned over the past 11 years while protecting users from spam, phishing, malware, and web attacks. Illustrated with concrete numbers and examples from one of the largest email systems on the planet, attendees will gain insight into specific techniques and approaches useful in fighting abuse and securing their online services.
In addition to raw content, my Program Co-Chair,
, and I have prioritized talk quality. Researchers dedicate months or years of their time to thinking about a problem and conducting the technical work of research, but a common criticism of technical conferences is that the actual presentation of that research seems like an afterthought. Rather than be a regurgitation of a research paper in slide format, a presentation is an opportunity for a researcher to explain the context and impact of their work in their own voice; a chance to inspire the audience to want to learn more or dig deeper. Taking inspiration from the
, Enigma will have shorter presentations, and the program committee has worked with each speaker to help them craft the best version of their talk.
Hope to see some of you at
later this month!
Security and Privacy
Four years of Schema.org - Recent Progress and Looking Forward
Thursday, December 17, 2015
Posted by Ramanathan Guha, Google Fellow
we announced schema.org
, a new initiative from Google, Bing and Yahoo! to create and support a common vocabulary for structured data markup on web pages. Since that time,
has been a resource for webmasters looking to add markup to their pages so that search engines can use that data to index content better and surface it in new experiences like
, and the
, which provides a growing vocabulary for describing various kinds of entity in terms of properties and relationships, has become increasingly important as the Web transitions to a multi-device, mobile-oriented world. We are now seeing schema.org being used on many millions of Web sites, defining data types and properties common across applications, platforms and products, in order to enhance the user experience by delivering the most relevant information they need, when they need it.
Schema.org in Google Rich Snippets
Schema.org in Google Knowledge Graph panels
Schema.org in Recipe carousels
Schema.org: Evolution of Structured Data on the Web
, an overview article published this week on ACM, we report some key schema.org adoption metrics from a sample of 10 billion pages from a combination of the Google index and Web Data Commons. In this sample, 31.3% of pages have schema.org markup, up from 22% one year ago. Structured data markup is now a core part of the modern web.
schema.org group at W3C
is now amongst the largest active W3C communities, serving as a hub for diverse groups exploring schemas covering diverse topics such as sports, healthcare, e-commerce, food packaging, bibliography and digital archive management. Other companies, also make use of the same data to build different applications, and as new use cases arise further schemas are integrated via
at W3C. Each of these topics in turn have subtle inter-relationships - for example schemas for food packaging, for flight reservations, for recipes and for restaurant menus, each have different approaches to describing food restrictions and allergies. Rather than try to force a common unified approach across these domains, schema.org's evolution is pragmatic, driven by the combination of available Web data, and the likelihood of mainstream consuming applications.
Schema.org is also finding new kinds of uses. One exciting line of work is the use of schema.org marked up pages as training corpus for machine learning. John Foley, Michael Bendersky and Vanja Josifovski used schema.org data
to build a system
that can learn to recognize events that may be geographically local to a particular user. Other researchers are looking at using schema.org pages with similar markup, but in different languages, to automatically create parallel corpora for machine translation.
Four years after its launch, Schema.org is entering its next phase, with more of the vocabulary development taking place in a more distributed fashion, as extensions. As
adoption has grown, a number groups with more specialized vocabularies have expressed interest in extending
with their terms. Examples of this include real estate, product, finance, medical and bibliographic information. A number of extensions, for topics ranging from automobiles to product details, are already underway. In such a model, schema.org itself is just the core, providing a unifying vocabulary and congregation forum as necessary.
Text-to-Speech for low resource languages (episode 2): Building a parametric voice
Tuesday, December 15, 2015
Posted by Alexander Gutkin, Google Speech Team
This is the second episode in the series of posts reporting on the work we are doing to build text-to-speech (TTS) systems for low resource languages. In the
, we described the crowdsourced data collection effort for Project Unison. In this episode, we describe our work to construct a parametric voice based on that data.
, we described building TTS systems for low resource languages, and how one of the objectives of data collection for such systems was to quickly build a database representing multiple speakers. There are two main justifications for this approach. First, professional voice talents are often not available for under-resourced languages, so we need to record ordinary people who get tired reading tedious text rather quickly. Hence, the amount of text a person can record is rather limited and we need multiple speakers for a reasonably sized database that can be used by others as well. Second, we wanted to be able to create a voice that sounds human but is not identifiable as a real person. Various concatenative approaches to speech synthesis, such as
, are not very suitable for this problem. This is because the selection algorithm may join acoustic units from different speakers generating a very unnatural sounding result.
Adopting parametric speech synthesis techniques is an attractive approach to building multi-speaker corpora described above. This is because in parametric synthesis the training stage of the statistical component will take care of multiple-speakers by estimating an
representation of various acoustic parameters representing each individual speaker. Depending on number of speakers in the corpus, their acoustic similarity and ratio of speaker genders, the resulting acoustic model can represent an average voice that is indistinguishable from human and yet cannot be traced back to any actual speakers recorded during the data collection.
We decided to use two different approaches to acoustic modeling in our experiments. The first approach uses
Hidden Markov Models
was pioneered by
Prof. Keiichi Tokuda
at Nagoya Institute of Technology, Japan and has been widely adopted in academia and industry. It is also supported by a dedicated open-source HMM synthesis toolkit. The resulting models are small enough to fit on mobile devices.
The second approach relies on
Recurrent Neural Networks
that jointly mimic the human speech production system. Vocoders mimic the vocal apparatus to provide a parametric representation of speech audio that is amenable to statistical mapping. RNNs provide a statistical mapping from the text to the audio and have feedback loops in their topology, allowing them to model temporal dependencies between various
in human speech. In 2015, Yannis Agiomyrgiannakis proposed
, a vocoder that outperforms the state-of-the-art technology in speed as well as quality. In 2013, Heiga Zen, Andrew Senior and Mike Schuster
proposed a neural network-based model
that mimics deep structure of human speech production for speech synthesis. The model has further been extended into a
Long Short-Term Memory
(LSTM) RNN. This allows long term memorization, which is good for speech applications. Earlier this year, Heiga Zen and Hasim Sak
described the LSTM RNN architecture
that has been specifically designed for fast speech synthesis. The LSTM RNNs are also used in our Automatic Speech Recognition (ASR) systems
recently mentioned in our blog
Using the Hidden Markov Model (HMM) and LSTM RNN synthesizers described above, we experimented with a multi-speaker
totaling 1526 utterances (waveforms and corresponding transcriptions) from five different speakers. We also built a third system that utilizes LSTM RNN acoustic model, but this time we made it small and fast enough to run on a mobile phone.
We synthesized the following Bangla sentence "এটি একটি বাংলা বাক্যের উদাহরণ" translated from “This is an example sentence in Bangla”. Though
HMM synthesizer output
can sound intelligible, it does exhibit some classic downsides with a voice that sounds buzzy and muffled. With the LSTM RNN configuration for mobile devices, the
sounds clearer and has improved intonation over the HMM version. We also tried a LSTM RNN configuration with more network nodes (and thus not suitable for low-end mobile devices) to generate
- the quality is slightly better but is not a huge improvement over the more lightweight LSTM RNN version. We hypothesize that this is due to the fact that a neural network with many nodes has more parameters and thus requires more data to train.
These early results are encouraging for several reasons. First, they confirm that natural-sounding speech synthesis based on multiple speakers is practically possible. It is also significant that the total number of recordings used was relatively small, yet were able to build intelligible parametric speech synthesis. This means that it is possible to collect training data for such a speech synthesizer by engaging the help of volunteers who are not professional voice artists, for a short period of time per person. Using multiple volunteers is an advantage: it results in more diverse data, and the resulting synthetic voice does not represent any specific individual. This approach may well be the foundation for bringing speech technology to many more traditionally under-served languages.
NEXT UP: But can it say, “Google”? (Ep.3)
Automatic Speech Recognition
Making online learning even easier with a re-envisioned Course Builder
Monday, December 14, 2015
Posted by Adam Feldman, Product Manager and Pavel Simakov, Technical Lead, Course Builder Team
(Cross-posted on the
Google for Education blog
The Course Builder team believes in enabling new and better ways to learn (for both the instructor and learner). Today's release of
Course Builder v1.10
furthers these goals in three ways, by being easier to use, embeddable and applicable to more types of content.
Easier to use
We took a step back and re-envisioned the menus and navigation of the administrative interface based on the steps instructors take as they create a course. These are designed to help you through the process of creating, styling, publishing and managing your courses. This re-imagined design gives a solid foundation for future versions of Course Builder.
A completely redesigned navigation simplifies content authoring and configuration.
To support this redesign, we’ve also completely revamped our documentation. There’s now one home for all of Course Builder’s materials:
Google Open Online Education
. Here, you’ll find everything you need to conceptualize and construct your content, create a course using Course Builder, and even develop new modules to extend Course Builder’s capabilities. The content now reflects the latest features and organization. This re-imagined design gives a solid foundation for future versions of Course Builder.
Embeddable assessment support
We started with embeddable assessments because evaluation is so important to learning, but we don’t plan to stop there. Watch for additional embeddable components in the future.
Applicable to more types of content
Many types of online learning content, like tutorials, exercises and documentation, are a lot like online courses. For instance, they might involve presenting content to users, having them do exercises or assessments and allowing them to stop and return later. Yet, you might not think of them as traditional courses.
To make Course Builder a better fit for a broader set of online content, we’ve added a new “guides” experience. Guides are a new way for students to browse and consume your content. Compared to typical online courses -- which can enforce a strict linear path (from unit 1 to unit 2, etc.) -- guides present your content as a non-numbered list. Users are free to enter and exit in any order. It also allows you to show the content for many courses together.
You could imagine each guide being a documentation page or tutorial section. Guides also work with any existing Course Builder units and can be made available by simply enabling that feature in the dashboard. Here are a couple of our courses, when viewed as guides:
Within each guide, the user is
through the steps, which could be portions of a docs page or lessons in a unit, as in this example from the “Power Searching with Google” sample course:
By letting users jump in and out of the content as they like, guides are ideally suited to the on-the-go learner and look great on phones and tablets. It’s our first foray into responsive mobile design... but it won’t be our last.
Guides currently support public courses, but we’ll be adding registration, enhanced statefulness and interface customization, as well as elements of dynamic learning (think of a personalized list of guides).
This release has focused on making Course Builder easier to use and more relevant. It sets up the framework to give future features a natural home. It adds embeddable assessments to make Course Builder useful in more places. And it introduces guides, a new, less linear format for consuming content.
For a full list of features, see the
, and let us know what you think. Keep on learning!
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog