Tag Archives: big data

Big data might be the missing puzzle piece in understanding climate change

A new approach might change how we study our planet’s climate.

Met Office Climate Data – Month by Month w/ Latitude distribution (January).

When it comes to information, the world has plenty of it. Data sets are increasing rapidly in almost all fields imaginable, with researchers having access to a wealth of data they couldn’t have even dreamed of a few decades ago.

The question is no longer whether you have access to data, but rather how you process and interpret it. This is the so-called big data problem.

Humans and Data

Big data has offered a number of much-needed innovations in modern society. It helped improve health care (by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics), infrastructure design (by helping planners understand how many cars are on the road at a given time), education (by helping universities design education programs tailored to fit the demand), and many more. But with regards to climate, big data has been less impactful.

There are a few reasons why this has been the case, amongst them the sheer diversity of the available data as well as the inhomogeneities in data availability (i.e. we have more data from densely populated cities than isolated areas). But without a doubt, Georgia researchers say, methodology is also to blame. Too often, scientists employ methods that offer simplistic “yes or no” answers instead of painting an accurate picture. This is counterproductive and needs to change.

“It’s not that simple in climate,” said Annalisa Bracco, a professor in Georgia Tech’s School of Earth and Atmospheric Sciences. “Even weak connections between very different regions on the globe may result from an underlying physical phenomenon. Imposing thresholds and throwing out weak connections would halt everything. Instead, a climate scientist’s expertise is the key step to finding commonalities across very different data sets or fields to explore how robust they are.”

Satellites offer a trove of valuable data, but analyzing and interpreting that data is not always straightforward. Image via Wikipedia.

She and her colleagues wanted to develop a data analysis method that relies more on the data itself and less on interpreters’ viewpoints and skills, so they developed a way to mine data from climate data sets that is more self-contained than traditional tools. They claim that this method creates results that are more robust and transparent. In other words, the data will be mined in such a way that two different people analyzing it will reach the same conclusion — something which is not always the case currently.

To make things even better, the methodology is open source and currently available to scientists (or passionate amateurs) around the world.

Another advantage of this approach is that it allows researchers to integrate several data types more easily. Climate is a complex, multifaceted issue, and it’s often hard to make sense of every bit of information from the field.

Furthermore, researchers can also operate the data without having an extensive knowledge of modeling and hardcore coding — something which was becoming more and more of a challenge as most climate scientists don’t have a background in coding.

“There are so many factors — cloud data, aerosols and wind fields, for example — that interact to generate climate and drive climate change,” said Athanasios Nenes, another College of Sciences climate professor on the project. “Depending on the model aspect you focus on, they can reproduce climate features effectively — or not at all. Sometimes it is very hard to tell if one model is really better than another or if it predicts climate for the right reasons.”

It remains to be seen whether or not this approach will enjoy success, but it certainly has the opportunity to foster a newer, simpler, and systematic approach to studying climate.

“Climate science is a ‘data-heavy’ discipline with many intellectually interesting questions that can benefit from computational modeling and prediction,” said Dovrolis, a professor in the School of Computer Science, “Cross-disciplinary collaborations are challenging at first — every discipline has its own language, preferred approach and research culture — but they can be quite rewarding at the end.”

The paper, “Advancing climate science with knowledge-discovery through data mining,” has been published in Climate and Atmospheric Science, a Nature journal.

Parisite-(La) is a newly discovered mineral species that was predicted by big data analysis. It was discovered in Brazil's northeast state of Bahia. Credit: Luiz Menezes.

Big Data predicts 1,500 mineral species are waiting to be found. Ten have already been discovered

Parisite-(La) is a newly discovered mineral species that was predicted by big data analysis. It was discovered in Brazil's northeast state of Bahia. Credit: Luiz Menezes.

Parisite-(La) is a newly discovered mineral species that was predicted by big data analysis. It was discovered in Brazil’s northeast state of Bahia. Credit: Luiz Menezes.

The same approach that Facebook uses to graph networks or scientists employ to map the spread of diseases was applied to predict new minerals species and deposits. It’s the first time network theory and big data — a buzz word that describes the uses of huge data sets to reveal patterns and make predictions otherwise difficult to obtain — have been employed to find new minerals.

High-tech geology

We know of about 5,200 mineral species, each of which has a unique combination of chemical composition and atomic structure. There are millions of mineral samples housed in museums, warehouses, universities or private collections, many of which have been neatly described and cataloged. For instance, it’s standard practice for a mineral sample to be tagged with information like the location it was recovered from, the level or occurrence, the age of the deposit or the mineral’s growth rate.

When you combine this information — not on one deposit but a myriad — with data on the surrounding geography, the geological setting, and coexisting minerals, it’s possible to fill in the blanks and infer the existence of not only deposits but also new mineral species.

It’s only recently that the technology has enabled scientists to use such a ‘big data’ approach.

“The quest for new mineral deposits is incessant, but until recently mineral discovery has been more a matter of luck than scientific prediction,” said Dr. Shaunna Morrison of the Deep Carbon Observatory in a statement. “All that may change thanks to big data.”

Just like data scientists use complex data sets to understand social media connections, city traffic or even metabolic pathways, so can the same method apply to mineralogy and petrology. There are really few fields of science where big data can’t help make a breakthrough.

Bearing this in mind, Morrison and colleagues were able to visualize data from multiple variables on thousands of mineral samples sourced from hundreds of thousands of locations around the world. And it was all represented within a single graph which reveals patterns of occurrence and distribution otherwise extremely difficult if not impossible to infer.

From here on, it’s only a matter of filling the gaps of a list of minerals. Moreover, the analysis also tells us where we can dig to find new deposits. Robert Hazen at the Carnegie Institution for Science discusses mineral evolution, mineral ecology, and mineral network analysis in this hour-long lecture embedded below.

Ewingite. Photo Travis Olds.

Ewingite. Photo Travis Olds.

The hunt for new minerals is on

Already, this approach enabled the researchers to predict the existence of 145 missing carbon-bearing minerals and where to find them. To accelerate their discovery, the Deep Carbon Observatory launched the Carbon Mineral Challenge to inspire professional and amateur mineralogists alike to hunt down minerals from this shortlist. Already, ten have been found. Among them is ewingite, the most structurally complex known mineral on Earth. Many more minerals await discovery, though.

“We have used the same kinds of techniques to predict that at least 1,500 minerals of all kinds are ‘missing,’ to predict what some of them are, and where to find them,” Dr. Hazen says.

Analyses of complex mineral datasets also help to untangle some of the intricate relationships between geology and biology, leading to new insights into the co-evolution of the geosphere and biosphere. For instance, mineral networks of igneous rocks can help retrace ‘Bowen’s reaction series’, which tells us how various characteristic minerals form once magma cools. The analysis was so precise that the predicted sequence of minerals matched reality precisely. In the future, mineral networks coupled with data on biomarker molecules can reveal insights on how cells and minerals interact.

Mineral networks can also serve as a powerful learning tool in academia for mineralogy and petrology, helping students visualize rocks, minerals and their relationships.

And of course, the industry could make billions by mining new deposits. Some of the yet to be identified minerals could have remarkable properties that might enable novel products on the market. Really, there are countless applications of big data on minerals and the implications could be far reaching.

“Minerals provide the basis for all our material wealth,” Morisson concluded, “not just precious gold and brilliant gemstones, but in the brick and steel of every home and office, in cars and planes, in bottles and cans, and in every high-tech gadget from laptops to iPhones.”

“Minerals form the soils in which we grow our crops, they provide the gravel with which we pave our roads, and they filter the water we drink.”

“This new tool for understanding minerals represents an important advance in a scientific field of vital interest.”

Findings appeared in the journal American Mineralogist.

Google Maps

How Google Maps can tell if there are traffic jams

Google Maps

If you use Google Maps to plan your commutes or other travels, you might have noticed how apt the app has become at predicting traffic. You know the fastest way out of the city might be to take the highway, yet Google tells you to the take the surface roads. A few minutes later you’re on your way, while the highway is packed. An accident must have occurred half an hour earlier or a road block was just posted to make way for infrastructure works. You’re happy, but at the same time you can’t help wonder: ‘how in the world did they pull it off?’

One of Google’s main assets is data. Lots of it. If you have location services turned on (that’s enough for Android) and Google Maps open (for Apple devices), Google will be fed with a stream of anonymous bits that speak of location, relative velocity and itinerary. Using this data, from both car passengers and pedestrians, Google uses machine learning algorithms to predict traffic jams. A slew of cars just started moving 30% slower than they should have? A couple minutes later Google alerts you to change your route – a route that Google optimized taking account the constantly shifting masses of vehicles in real time. Of course, Google can also predict the future. Scores of historical data lends Google a clairvoyant ability. This prediction is made based upon the data that was available on the same day, last week, last month or last year, based upon if it was a public holiday, a working day or just a normal weekend. It can even tell if there’s a marathon on nothing but locations data.

Of course, the traffic data comes from multiple sources in order for it to be considered reliable. Besides smartphones, Google employs smartsensors in key locations and data from local authorities. It also incorporates traffic and incident data fed by the Waze app and its community. Waze, the navigation app, was bought by Google in 2013 for $1 billion.

As more and more drivers use the app, the traffic predictions become more reliable because Google Maps can look at the average speed of cars traveling along the same route without misinterpreting someone’s morning coffee stop as a traffic jam. Likewise, if you live in a city or country with few Google Maps users, traffic prediction will be poor or won’t be activated at all.

Google isn’t alone though, although its Maps app is the most widely used. Microsoft is currently working on the Traffic Prediction Project with select partners to  help predict traffic jams up to an hour in advance. Again, it’s all about Big Data, crunching all traffic data, including historical numbers where available, from transport departments, road cameras, Microsoft’s Bing traffic maps, and even drivers’ social networks.

A recent report shows America will waste $2.8 trillion being stuck in traffic by 2030. The solution? Self-driving cars (these drive steady and constantly optimize velocity relative to other cars), traffic jam solutions like the ones offered by G Maps and Microsoft (until all cars all self-driving) and, wait for it, less cars. Yes, the best solution to solving traffic jams is to stop owning a car and start using public transportation.

Big Data

‘Data Smashing’ algorithm might help declutter Big Data noise without Human Intervention

There’s an immense well of information humanity is currently sitting on and it’s only growing exponentially. To make sense of all the noise, whether we’re talking about apps like speech recognition, cosmic body identification or search engine results, highly complex algorithms that use less processing power by hitting the bull’s eye or as close as possible are warranted. In the future, such algorithms will be comprised of machine learning technology that gets smarter and smarter after each information parse; this will most likely employ quantum computing as well. Until then, we have to make use of conventional algorithms and a most exciting paper detailing such a technique was recently reported.

Smashing data – the bits and pieces that follow are the most important

Big Data

Credit: 33rd Square

Called ‘data smashing’, the algorithm tries to fix one major flaw in today’s information processing. Immense amounts of data are currently being fed in and while algorithms help us declutter, at the end of the day companies and governments still need experts to oversee the process and grant a much need human fine touch. Basically, computers are still pretty bad at solving complex patterns. Sure, they’re awesome for crunching the numbers, but in the end, humans need to compare the outputted scenarios and pick out the most relevant answer. As more and more processes are being monitored and fed into large data sets, however, this task is becoming ever more difficult and human experts are in low supply.

[ALSO READ] Breakthrough in computing: brain-like chip features 4096 cores, 1 million neurons, 5.4 billion transistors

The algorithm, developed by Hod Lipson, associate professor of mechanical engineering and of computing and information science, and Ishanu Chattopadhyay, a former postdoctoral associate with Lipson now at the University of Chicago, is nothing short of brilliant. It works by estimating the similarities between streams of arbitrary data without human intervention, and even without access to the data sources.

Basically, data is being ‘smashed’ with one another to tease out unique information by measuring what remains after each ‘collision’. The more info stands, the less likely it is it originated from the same streams.

Data smashing could open doors to a new body of research – it’s not just helping experts sort through data easier, it might also actually identify anomalies that are impossible to spot by humans in virtue of pure computing brute force. For instance, the researchers demonstrated data smashing using data from real-world problems, including detection of anomalous cardiac activity from heart recordings and classification of astronomical objects from raw photometry. Results showed that the info was on par with the accuracy of specialized algorithms and heuristics tweaked by experts to work.