A joint project between the University of California, Berkeley, Google Brain, and the Intel Corporation aims to teach robots how to perform sutures — using YouTube.
The AIs we can produce are still limited, but they are very good at rapidly processing large amounts of data. This makes them very useful for medical applications, such as their use in diagnosing Chinese patients during the early months of the pandemic. They’re also lending a digital hand towards finding a treatment and vaccine for the virus.
But actually taking part in a medical procedure isn’t something that they’ve been able to pull off. This work takes a step in that direction, showing how deep-learning can be applied to automatically create sutures in the operating room.
The team worked with a deep-learning setup called a Siamese network, created from two or more deep-learning networks sharing the same data. One of their strengths is the ability to assess relationships between data, and they have been used for language detection applications, facial detection, and signature verification.
However, training AIs well requires massive amounts of data, and the team turned to YouTube to get it. As part of a previous project, the researchers tried to teach a robot to dance using videos. They used the same approach here, showing their network video footage of actual procedures. Their paper describes how they used YouTube videos to train a two-armed da Vinci surgical robot to insert needles and perform sutures on a cloth device.
“YouTube gets 500 hours of new material every minute. It’s an incredible repository,” said Ken Goldberg from UC Berkeley, co-author of the paper. “Any human can watch almost any one of those videos and make sense of it, but a robot currently cannot—they just see it as a stream of pixels.”
“So the goal of this work is to try and make sense of those pixels. That is to look at the video, analyze it, and be able to segment the videos into meaningful sequences.”
It took 78 instructional videos to train the AI to perform sutures with an 85% success rate, the team reports. Eventually, they hope, such robots could take over simple, repetitive tasks to allow surgeons to focus on their work.
We’re nowhere near having a fully-automated surgery team, but in time, the authors hope to build robots that can interact with and assist the doctors during procedures.
The report “Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos” is available here.
If you want a machine to learn to do something intelligent you either have to program it or teach it to learn.
For decades, engineers have been programming machines to perform all sorts of tasks — from software that runs on your personal computer and smartphone to guidance control for space missions.
But although computers are generally much faster and more precise than the human brain at sequential tasks, such as adding numbers or calculating chess moves, such programs are very limited in their scope. Something as trivial as identifying a bicycle among a crowded pedestrian street or picking up a hot cup of coffee from a desk and gently moving it to the mouth can send a computer into convulsions, nevermind conceptualizing or abstraction (such as designing a computer itself).
The gist is that humans were never programmed (not like a digital computer, at least) — humans have become intelligent through learning.
Do machine learning and deep learning ring a bell? They should. These are not merely buzz words — they’re techniques that have literally triggered a renaissance of artificial intelligence leading to phenomenal advances in self-driving cars, facial recognition, or real-time speech translations.
Although AI systems seem to have appeared out of nowhere in the previous decade, the first seeds were laid as early as 1956 by John McCarthy, Claude Shannon, Nathan Rochester, and Marvin Minsky at the Dartmouth Conference. Concepts like artificial neural networks, deep learning, but also neuro-symbolic AI are not new — scientists have been thinking about how to model computers after the human brain for a very long time. It’s only fairly recently that technology has developed the capability to store huge amounts of data and significant processing power, allowing AI systems to finally become practically useful.
But despite impressive advances, deep learning is still very far from replicating human intelligence. Sure, a machine capable of teaching itself to identify skin cancer better than doctors is great, don’t get me wrong, but there are also many flaws and limitations.
One important limitation is that deep learning algorithms and other machine learning neural networks are too narrow.
When you have huge amounts of carefully curated data, you can achieve remarkable things with them, such as superhuman accuracy and speed. Right now, AIs have crushed humans at every single important game, from chess to Jeopardy! and Starcraft.
However, their utility breaks down once they’re prompted to adapt to a more general task. What’s more, these narrow-focused systems are prone to error. For instance, take a look at the following picture of a “Teddy Bear” — or at least in the interpretation of a sophisticated modern AI.
These are just a couple of examples that illustrate that today’s systems don’t truly understand what they’re looking at. And what’s more, artificial neural networks rely on enormous amounts of data in order to train them, which is a huge problem in the industry right now. At the rate at which computational demand is growing, there will come a time when even all the energy that hits the planet from the sun won’t be enough to satiate our computing machines. Even so, despite being fed millions of pictures of animals, a machine can still mistake a furry cup for a teddy bear.
Meanwhile, the human brain can recognize and label objects effortlessly and with minimal training — basically we only need one picture. If you show a child a picture of an elephant — the very first time they’ve ever seen one — that child will instantly recognize that a) that is an animal and b) that this is an elephant next time they’ll come across that animal, either in real life or in a picture.
This is why we need a middle ground — a broad AI that can multi-task and cover multiple domains, but which also can read data from a variety of sources (text, video, audio, etc), whether the data is structured or unstructured. Enter the world of neuro-symbolic AI.
David Cox is the head of the MIT-IBM Watson AI Lab, a collaboration between IBM and MIT that will invest $250 million over ten years to advance fundamental research in artificial intelligence. One important avenue of research is neuro-symbolic AI.
“A neuro-symbolic AI system combines neural networks/deep learning with ideas from symbolic AI. A neural network is a special kind of machine learning algorithm that maps from inputs (like an image of an apple) to outputs (like the label “apple”, in the case of a neural network that recognizes objects). Symbolic AI is different; for instance, it provides a way to express all the knowledge we have about apples: an apple has parts (a stem and a body), it has properties like its color, it has an origin (it comes from an apple tree), and so on,” Cox told ZME Science.
“Symbolic AI allows you to use logic to reason about entities and their properties and relationships. Neuro-symbolic systems combine these two kinds of AI, using neural networks to bridge from the messiness of the real world to the world of symbols, and the two kinds of AI in many ways complement each other’s strengths and weaknesses. I think that any meaningful step toward general AI will have to include symbols or symbol-like representations,” he added.
By combining the two approaches, you end up with a system that has neural pattern recognition allowing it to see, while the symbolic part allows the system to logically reason about symbols, objects, and the relationships between them. Taken together, neuro-symbolic AI goes beyond what current deep learning systems are capable of doing.
“One of the reasons why humans are able to work with so few examples of a new thing is that we are able to break down an object into its parts and properties and then to reason about them. Many of today’s neural networks try to go straight from inputs (e.g. images of elephants) to outputs (e.g. the label “elephant”), with a black box in between. We think it is important to step through an intermediate stage where we decompose the scene into a structured, symbolic representation of parts, properties, and relationships,” Cox told ZME Science.
Here are some examples of questions that are trivial to answer by a human child but which can be highly challenging for AI systems solely predicated on neural networks.
Neural networks are trained to identify objects in a scene and interpret the natural language of various questions and answers (i.e. “What is the color of the sphere?”). The symbolic side recognizes concepts such as “objects,” “object attributes,” and “spatial relationship,” and uses this capability to answer questions about novel scenes that the AI had never encountered.
You could achieve a similar result to that of a neuro-symbolic system solely using neural networks, but the training data would have to be immense. Moreover, there’s always the risk that outlier cases, for which there is little or no training data, are answered poorly. In contrast, this hybrid approach boosts a high data efficiency, in some instances requiring just 1% of training data other methods need.
The next evolution in AI
Just like deep learning was waiting for data and computing to catch up with its ideas, so has symbolic AI been waiting for neural networks to mature. And now that two complementary technologies are ready to be synched, the industry could be in for another disruption — and things are moving fast.
“We’ve got over 50 collaborative projects running with MIT, all tackling hard questions at the frontiers of AI. We think that neuro-symbolic AI methods are going to be applicable in many areas, including computer vision, robot control, cybersecurity, and a host of other areas. We have projects in all of these areas, and we’ll be excited to share them as they mature,” Cox said.
But not everyone is convinced that this is the fastest road to achieving general artificial intelligence.
“I think that symbolic style reasoning is definitely something that is important for AI to capture. But, many people (myself included) believe that human abilities with symbolic logic emerge as a result of training, and are not convinced that an explicitly hard-wiring in symbolic systems is the right approach. I am more inclined to think that we should try to design artificial neural networks (ANNs) that can learn how to do symbolic processing. The reason is this: it is hard to know what should be represented by a symbol, predicate, etc., and I think we have to be able to learn that, so hard-wiring the system in this way is maybe not a good idea,” Blake Richards, who is an Assistant Professor in the Montreal Neurological Institute and the School of Computer Science at McGill University, told ZME Science.
Irina Rish, an Associate Professor in the Computer Science and Operations Research department at the Université de Montréal (UdeM), agrees that neuro-symbolic AI is worth pursuing but believes that “growing” symbolic reasoning out of neural networks, may be more effective in the long-run.
“We all agree that deep learning in its current form has many limitations including the need for large datasets. However, this can be either viewed as criticism of deep learning or the plan for future expansion of today’s deep learning towards more capabilities,” Rish said.
Rish sees current limitations surrounding ANNs as a ‘to-do’ list rather than a hard ceiling. Their dependence on large datasets for training can be mitigated by meta- and transfer-learning, for instance. What’s more, the researcher argues that many assumptions in the community about how to model human learning are rather flawed, calling for more interdisciplinary research.
“A common argument about “babies learning from a few samples unlike deep networks” is fundamentally flawed since it is unfair to compare an artificial neural network trained from scratch (random initialization, some ad-hoc architectures) with a highly structured, far-from-randomly initialized neural networks in baby’s brains, incorporating prior knowledge about the world, from millions of years of evolution in varying environments. Thus, more and more people in the deep learning community now believe that we must focus more on interdisciplinary research on the intersection of AI and other disciplines that have been studying brain and minds for centuries, including neuroscience, biology, cognitive psychology, philosophy, and related disciplines,” she said.
Rish points to exciting recent research that focuses on “developing next-generation network-communication based intelligent machines driven by the evolution of more complex behavior in networks of communicating units.” Rish believes that AI is naturally headed towards further automation of AI development, away from hard-coded models. In the future, AI systems will also be more bio-inspired and feature more dedicated hardware such as neuromorphic and quantum devices.
“The general trend in AI and in computing as a whole, towards further and further automation and replacing hard-coded approaches with automatically learned ones, seems to be the way to go,” she added.
For now, neuro-symbolic AI combines the best of both worlds in innovative ways by enabling systems to have both visual perception and logical reasoning. And, who knows, maybe this avenue of research might one day bring us closer to a form of intelligence that seems more like our own.
Google researchers are extremely intuitive: just by looking into people’s eyes they can see their problems — cardiovascular problems, to be precise. The scientists trained artificial intelligence (AI) to predict cardiovascular hazards, such as strokes, based on the analysis of retina shots.
The way the human eye sees the retina vs the way the AI sees it. The green traces are the pixels used to predict the risk factors. Photo Credit: UK Biobank/Google
After analyzing data from over a quarter million patients, the neural network can predict the patient’s age (within a 4-year range), gender, smoking status, blood pressure, body mass index, and risk of cardiovascular disease.
“Cardiovascular disease is the leading cause of death globally. There’s a strong body of research that helps us understand what puts people at risk: Daily behaviors including exercise and diet in combination with genetic factors, age, ethnicity, and biological sex all contribute. However, we don’t precisely know in a particular individual how these factors add up, so in some patients, we may perform sophisticated tests … to help better stratify an individual’s risk for having a cardiovascular event such as a heart attack or stroke”, declared study co-author Dr. Michael McConnell, a medical researcher at Verily.
Even though you might think that the number of patients the AI was trained on is large, AI networks typically work with much larger sample sizes. In order for neural networks to be more accurate in their predictions, they must analyze as much data as possible. The results of this study show that, until now, the predictions made by AI cannot outperform specialized medical diagnostic methods, such as blood tests.
“The caveat to this is that it’s early, (and) we trained this on a small data set,” says Google’s Lily Peng, a doctor and lead researcher on the project. “We think that the accuracy of this prediction will go up a little bit more as we kind of get more comprehensive data. Discovering that we could do this is a good first step. But we need to validate.”
The deep learning applied to photos of the retina and medical data works like this: the network is presented with the patient’s retinal shot, and then with some medical data, such as age, and blood pressure. After seeing hundreds of thousands of these kinds of images, the machine will start to see patterns correlated with the medical data inserted. So, for example, if most patients that have high blood pressure have more enlarged retinal vessels, the pattern will be learned and then applied when presented just the retinal shot of a prospective patient. The algorithms correctly discovered patients who had great cardiovascular risks within a 5-year window 70 percent of the time.
“In summary, we have provided evidence that deep learning may uncover additional signals in retinal images that will allow for better cardiovascular risk stratification. In particular, they could enable cardiovascular assessment at the population level by leveraging the existing infrastructure used to screen for diabetic eye disease. Our work also suggests avenues of future research into the source of these associations, and whether they can be used to better understand and prevent cardiovascular disease,” conclude the authors of the study.
The paper, published in the journal Nature Biomedical Engineering, is truly remarkable. In the future, doctors will be able to screen for the number one killer worldwide much more easily, and they will be doing it without causing us any physical discomfort. Imagine that!
The internet has always had a crush on Nicholas Cage and put him at the center of thousands of memes. The latest Cage viral mayhem, however, is both entertaining and incredibly creepy. With the help of a crafty software called FakeApp, which uses deep learning technology, people have been inserting Nic Cage’s face into all sorts of famous movies.
FakeApp uses technology that allows you to scan the face of a person, then uploads it to pre-existing video content. In our case, people have hilariously swapped Cage’s face for Andy Samberg, James Bond, Indiana Jones, even freaking Lana Lane from Superman.
Deep learning employs neural networks which are interconnected by nodes that run automated computations on input data. Deep-learning software attempts to mimic the activity in layers of neurons in the neocortex, the wrinkly 80 percent of the brain where thinking occurs. It’s the closest we’ve come so far to the real purpose of artificial intelligence — real learning, like recognizing patterns from sounds, images, and other data, instead of task-specific pre-configured instructions from a programmer.
In the case of FakeApp, after an initial training session, the software’s node arrange themselves to convincingly superimpose a celebrity’s face over any kind of video. The more and varied the original video content is, the most convincing the final representation will be.
These Nicholas Cage fake videos are certainly hilarious but it’s not all fun and games. When this technology meets forgery, people’s livelihoods are at risk. Last month, for instance, we learned how a redditor used open-source deep learning tools to face-swap celebrities onto the bodies of porn actresses. There’s a whole subreddit dedicated to fake deep learning porn. The results are surprisingly convincing, considering these clips had a one-man production team. Imagine the kind of things someone with a sizeable budget could accomplish.
Similar technology also exists for faking someone’s voice. A combination of the two means that what used to be traditionally solid evidence in court (video and audio) is now suddenly questionable. What’s more, it’s not hard to imagine a future where such doctored videos and footage can be used to blackmail celebrities or put someone in a bad light. By the time the fake footage is publically called out, the damage is already done in the minds of most people.
Prepare for #FakeNews 2.0 — uglier and more polarizing than ever. The future doesn’t seem boring at all but it does seem frightening.
When we think of neural networks or machine learning we tend to imagine them involved in radically transforming the world by preventing crimes before they happen or improving energy efficiency. But you don’t need to change the world to make good use of machine learning. Sometimes, using this tech to help your mom and pop’s business can be just as impressive.
Makoto Koike, a former embedded systems designer, first got into machine learning after he found out that Google’s DeepMind division used it to train a machine to beat a human champion at Go. He later learned about TensorFlow, an open source library released by Google that allows engineers to implement deep neural networks without prior knowledge of all the complex mathematical and optimization algorithms. Using TensorFlow, Koike saved countless man-hours for his parents who run a cucumber farm in Japan.
What deep learning does is it allows a computer to learn from training data, like a huge database of images, what the most important or defining features that it needs to recognize are. With the help of a hierarchy of numerous artificial neurons, deep learning can then classify all sorts of images with high accuracy. A neural network could be used to recognize different species of cats from varying images or even predict what happens next.
Crooked cucumbers are ranked lower. Credit: Makoto Koike
His deep learning system uses a Raspberry Pi 3 as the controller and a personal computer that’s connected to the TensorFlow server. Once a cucumber travels down the line and in front of the system’s camera, a picture is taken and then Makoto’s modified neural network tries to ‘understand’ what it’s looking it at. First, it determines that the item sitting right in front of the camera is a cucumber, then it further classifies it because, if you didn’t know, cucumbers come in all sort of sizes, textures, and varying degrees of quality. For instance, thorny cucumbers with many prickles still on them that are fresh and vivid in color are sold at a premium, so sorting the gems from the clutter can be very lucrative.
“The sorting work is not an easy task to learn. You have to look at not only the size and thickness, but also the color, texture, small scratches, whether or not they are crooked and whether they have prickles. It takes months to learn the system and you can’t just hire part-time workers during the busiest period. I myself only recently learned to sort cucumbers well,” Makoto said for Google Cloud’s blog.
Makoto spent about three months taking 7,000 pictures of cucumbers sorted by his mother, but this was still not enough. Partly to blame is his system which can take and analyze pictures of cucumbers that are only 80×80 pixels in resolution.
“When I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real use cases, the accuracy drops down to about 70%. I suspect the neural network model has the issue of “overfitting” (the phenomenon in a neural network where the model is trained to fit only to the small training dataset) because of the insufficient number of training images.”
Makoto needs to invest a lot more than $1,000 — the money he spent so far — to make his cucumber assembly line really stellar. Even so, his results are inspiring because they go to show just how much you can achieve if you set your mind to it.
The word on every tech executive’s mouth today is data. Curse or blessing, there’s so much data lying around – with about 2.5 quintillion bytes of data added each day – that it’s become increasingly difficult to make sense of it in a meaningful way. There’s a solution to the big data problem, though: machine learning algorithms that get fed countless variables and spot patterns otherwise oblivious to humans. Researchers have already made use of machine learning to solve challenges in medicine, cosmology and, most recently, crime. Tech giant Hitachi, for instance, developed a machine learning interface reminiscent of Philip K. Dick’s Minority Report that can predict when, where and possibly who might commit a crime before it happens.
Machines listening from crime
Screenshot from the movie Minority Report.
It’s called Visualization Predictive Crime Analytics (PCA) and while it hasn’t been tested in the field yet, Hitachi claims that it works by gobbling immense amounts of data from key sensors layered across a city (like those that listen for gun shots), weather reports and social media to predict where crime is going to happen next. “A human just can’t handle when you get to the tens or hundreds of variables that could impact crime,” says Darrin Lipscomb who is directly involved in the project, “like weather, social media, proximity to schools, Metro [subway] stations, gunshot sensors, 911 calls.”
Real footage of the Hitachi crime predicting interface which officers might use. Image: Hitachi
Police nowadays use all sorts of gimmicks to either rapidly intervene when a crime is taking place or take cues and sniff leads that might help them avert one. For instance, police officers might use informers, scour social media for gang altercations or draw a map of thefts to predict when the next one might take place. This is a cumbersome process and officers are only human after all. They will surely miss some valuable hints a computer might easily draw out. Of course, the reverse is also true, as is often the case in fact, but if we’re talking about volume – predicting thousands of possible felonies every single day in a big city – the deep learning machine will beat even the most astute detective.
PCA is particularly effective, supposedly, at scouring social media which Hitachi says improves accuracy by 15%. The company used a natural language processing algorithm to teach their machines how to understand colloquial text or speech posted on facebook or twitter. It knows, for instance, how to pull out geographical information and tell if a drug deal might take place in a neighborhood.
Officers would use PCA’s interface – quite reminiscent of Minority Report, again – to see which areas are more vulnerable. A colored map shows where cameras and sensors are placed in a neighborhood and alerts the officer on duty if there’s a chance a crime might take place there, be it a robbery or a gang brawl. Dispatch would then send officers in the area to intervene or possibly deter would-be felons from engaging in criminal activity.
PCA provides a highly visual interface, with color-coded maps indicating the intensity of various crime indicators. Image: Hitachi
In all event, this is not evidence of precognition. The platform just returns vulnerable neighborhoods and alerts officers of a would-be crime. You might have heard about New York City’s stop-and-frisk practice, where suspicious people are searched for guns or drugs. PCA works fundamentally different since it actually offers officers something to start with – it at least provides a more focused leverage. “I don’t have to implement stop-and-frisk. I can use data and intelligence and software to really augment what police are doing,” Lipscomb says. Of course, this raises the question: won’t this lead to innocent people being targeted on mere suspicion fed by a computer? Well, just look at stop-and-frisk. More than 85% of those searched on New York’s streets are either Latino or African-American. Even if you account for differences ethnic crime rates, stop-and-frisk is clearly biased. The alternative sounds a lot better since police might actually know who to target.
Hitachi’s crime prediction tool will be tested in six large US cities soon, which Hitachi has declined to spell. The trials will be double-blinded, meaning police will go on business as usual, while the machine will run in the background. Then Hitachi will compare what crimes the police report with the crimes the machine predicted might have happened. If the two overlap beyond a statistical threshold, then you have a winner.
Using a novel deep learning algorithm, a team at UC Berkeley demonstrated a robot that learns on the fly and performs various tasks that weren’t pre-programmed. It starts off shy and clumsy, but eventually gets the ahead of it. For instance, after it stomped a bit around its environment, when given a new task, but with no further instructions, the robot learned by itself to assemble LEGO bricks or twist caps onto pill bottles.
The BRETT assembling a toy airplane. Image: YouTube
We humans are easily impressed by robots. They’re fast, efficient and seem to do most our jobs better than we’ll ever can. In fact, half of all jobs today (drivers, tellers, call center etc) could be replaced by bots within 20 years. But most robots aren’t smarter than a vacuum cleaner. Ask them to do anything beyond their dull, pre-programmed repetitive tasks and … well, they won’t respond in any way since they don’t have what you or me would call “thinking” or “consciousness”. This is where artificial intelligence comes in, you might say. Well, the artificial intelligence dream gained a lot of hype during the ’70s, but soon died off after specialists realized their forecasts of a sentient artificial being coming to life in the year 2000 or whatever is way off. Recently, the movement has gathered steam again. Berkeley’s BRETT (Berkley Robot of the Elimination of Tedious Tasks) is one prime example.
The bot was developed using deep learning algorithms, which are inspired by the way human neural networks fire and interact to help us make sense of the world.
“For all our versatility, humans are not born with a repertoire of behaviours that can be deployed like a Swiss army knife, and we do not need to be programmed,” said robotics researcher Sergey Levine in a press release. “Instead, we learn new skills over the course of our life from experience and from other humans. This learning process is so deeply rooted in our nervous system, that we cannot even communicate to another person precisely how the resulting skill should be executed. We can at best hope to offer pointers and guidance as they learn it on their own.”
Google’s StreetView or the equally impressive Siri are just a taste of what’s to come from the field of AI. Building similar networks for screw-and-bolt robots has proven a lot more difficult, however. BRETT is quite a milestone in this respect since it enforces deep learning to complete new motor tasks.
In a series of experiments, BREET was first tasked to assemble a toy airplane wheel. Clumsy and cumbersome, the robot eventually finished the operation in 12 long minutes. However, applying what it had learned previously, BRETT then quickly completed other motor tasks like stacking LEGO bricks or placing pegs into holes.
Key to its deep learning algorithm is a reward system which basically scores higher those movements that lead to a completed task than others.
“We still have a long way to go before our robots can learn to clean a house or sort laundry, but our initial results indicate that these kinds of deep learning techniques can have a transformative effect in terms of enabling robots to learn complex tasks entirely from scratch,” said Pieter Abbeel of UC Berkeley’s Department of Electrical Engineering and Computer Sciences. “In the next five to 10 years, we may see significant advances in robot learning capabilities through this line of work.”