From Google’s Artificial Intelligence Built an AI That Outperforms Any Made by Humans
In May 2017, researchers at Google Brain announced the creation of AutoML, an artificial intelligence (AI) that’s capable of generating its own AIs. More recently, they decided to present AutoML with its biggest challenge to date, and the AI that can build AI created a “child” that outperformed all of its human-made counterparts.
AutoML acts as a controller neural network that develops a child AI network for a specific task. For this particular child AI, which the researchers called NASNet, the task was recognizing objects — people, cars, traffic lights, handbags, backpacks, etc. — in a video in real-time.
NASNet was 82.7 percent accurate at predicting images on ImageNet’s validation set. This is 1.2 percent better than any previously published results, and the system is also 4 percent more efficient, with a 43.1 percent mean Average Precision (mAP)
The Google researchers acknowledge that NASNet could prove useful for a wide range of applications and have open-sourced the AI for inference on image classification and object detection. “We hope that the larger machine learning community will be able to build on these models to address multitudes of computer vision problems we have not yet imagined,” they wrote in their blog post.
Though the applications for NASNet and AutoML are plentiful, the creation of an AI that can build AI does raise some concerns. For instance, what’s to prevent the parent from passing down unwanted biases to its child? What if AutoML creates systems so fast that society can’t keep up?
We are waiting to develop a human-level artificial intelligence and see if it will improve itself to the point of becoming a superintelligence. Maybe it’s exceptionally close.
From Google Brain chief: Deep learning takes at least 100,000 examples | VentureBeat
“I would say pretty much any business that has tens or hundreds of thousands of customer interactions has enough scale to start thinking about using these sorts of things,” Jeff Dean, a senior fellow at Google, said in an onstage interview at the VB Summit in Berkeley, California. “If you only have 10 examples of something, it’s going to be hard to make deep learning work. If you have 100,000 things you care about, records or whatever, that’s the kind of scale where you should really start thinking about these kinds of techniques.”
From [1705.08421] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 57.6k movie clips with actions localized in space and time, resulting in 210k action labels with multiple labels per human occurring frequently. The main differences with existing video datasets are: the definition of atomic visual actions, which avoids collecting data for each and every complex action; precise spatio-temporal annotations with possibly multiple annotations for each human; the use of diverse, realistic video material (movies). This departs from existing datasets for spatio-temporal action recognition, such as JHMDB and UCF datasets, which provide annotations for at most 24 composite actions, such as basketball dunk, captured in specific environments, i.e., basketball court.
We implement a state-of-the-art approach for action localization. Despite this, the performance on our dataset remains low and underscores the need for developing new approaches for video understanding. The AVA dataset is the first step in this direction, and enables the measurement of performance and progress in realistic scenarios.
From Google built a dataset to teach its artificial intelligence how humans hug, cook, and fight — Quartz
Google, which owns YouTube, announced on Oct. 19 a new dataset of film clips, designed to teach machines how humans move in the world. Called AVA, or “atomic visual actions,” the videos aren’t anything special to human eyes—they’re three second clips of people drinking water and cooking curated from YouTube. But each clip is bundled with a file that outlines the person that a machine learning algorithm should watch, as well as a description of their pose, and whether they’re interacting with another human or object. It’s the digital version of pointing at a dog with a child and coaching them by saying, “dog.”
This technology could help Google to analyze the years of video it processes on YouTube every day. It could be applied to better target advertising based on whether you’re watching a video of people talk or fight, or in content moderation. The eventual goal is to teach computers social visual intelligence, the authors write in an accompanying research paper, which means “understanding what humans are doing, what might they do next, and what they are trying to achieve.”
Google’s video dataset is free.
In 2015, I speculated on Twitter:
I wonder if @google already has enough @youtube videos to create a video version of Wikipedia (and if they already are machine learning it)
From PathNet: Evolution Channels Gradient Descent in SuperNeural Networks
For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting.
PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks.
Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function.
We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A.
From Joseph Redmon: How computers learn to recognize objects instantly | TED.com
Joseph Redmon works on the YOLO (You Only Look Once) system, an open-source method of object detection that can identify objects in images and video — from zebras to stop signs — with lightning-quick speed. In a remarkable live demo, Redmon shows off this important step forward for applications like self-driving cars, robotics and even cancer detection.
A few years ago, on my personal Twitter account, I suggested that Google side benefit of owning YouTube would be having the largest archive of human activities on video to train its AI. What Redmon did here is what I had in mind at that time.
By the way, the demonstration during the TED talk is impressive.
From Google Glass 2.0 Is a Startling Second Act | WIRED
Companies testing EE—including giants like GE, Boeing, DHL, and Volkswagen—have measured huge gains in productivity and noticeable improvements in quality. What started as pilot projects are now morphing into plans for widespread adoption in these corporations. Other businesses, like medical practices, are introducing Enterprise Edition in their workplaces to transform previously cumbersome tasks.
For starters, it makes the technology completely accessible for those who wear prescription lenses. The camera button, which sits at the hinge of the frame, does double duty as a release switch to remove the electronics part of unit (called the Glass Pod) from the frame. You can then connect it to safety glasses for the factory floor—EE now offers OSHA-certified safety shields—or frames that look like regular eyewear. (A former division of 3M has been manufacturing these specially for Enterprise Edition; if EE catches on, one might expect other frame vendors, from Warby Parker to Ray-Ban, to develop their own versions.)
Other improvements include beefed-up networking—not only faster and more reliable wifi, but also adherence to more rigorous security standards—and a faster processor as well. The battery life has been extended—essential for those who want to work through a complete eight-hour shift without recharging. (More intense usage, like constant streaming, still calls for an external battery.) The camera was upgraded from five megapixels to eight. And for the first time, a red light goes on when video is being recorded.
If Glass EE gains traction, and I believe so if it evolves into a platform for enterprise apps, Google will gain a huge amount of information and experience that can reuse on the AR contact lenses currently in the work.
From Anna Patterson talks Gradient Ventures, Google’s new AI fund | TechCrunch
It’s been pretty obvious for a few months now, but Google has finally admitted that it’s running its own investment fund targeting machine intelligence startups. The fund will go by the name Gradient Ventures and provide capital, resources and education to AI-first startups.
Google isn’t disclosing the size of the fund, but the company told us that it’s being run directly off of Google’s balance sheet and will have the flexibility to follow on when it makes sense. This is in contrast to GV (formally Google Ventures) and Capital G, which operate as independent funds.
AI is the first technology in a long time posing a real threat to Google dominance. In other words, artificial intelligence is the best bet for a newcomer to become the next Google. No surprise Google wants to spot that newcomer as early as possible.
From PAIR: the People + AI Research Initiative
Today we’re announcing the People + AI Research initiative (PAIR) which brings together researchers across Google to study and redesign the ways people interact with AI systems. The goal of PAIR is to focus on the “human side” of AI: the relationship between users and technology, the new applications it enables, and how to make it broadly inclusive. The goal isn’t just to publish research; we’re also releasing open source tools for researchers and other experts to use.
From Google Lens offers a snapshot of the future for augmented reality and AI | AndroidAuthority
At the recent I/0 2017, Google stated that we were at an inflexion point with vision. In other words, it’s now more possible than ever before for a computer to look at a scene and dig out the details and understand what’s going on. Hence: Google Lens.This improvement comes courtesy of machine learning, which allows companies like Google to acquire huge amounts of data and then create systems that utilize that data in useful ways. This is the same technology underlying voice assistants and even your recommendations on Spotify to a lesser extent.