Big data in research – both reality and rhetoric

Astronomic amounts of new digital information about the world, our genetic heritage and our habits are continuously being generated. This information is a goldmine for research – as long as the data can be accessed, stored and analysed.
“We have a lot of expertise in the field. More and more areas of Lund University are nearing the threshold for big data as an integral part of research and teaching”, says Sven Strömqvist, Pro Vice-Chancellor for research and research infrastructure.

Increasing amounts of new information are being gathered, but old information is also being digitalised and stored. Researchers can use sophisticated methods of analysis to uncover patterns and correlations in the data that we have not previously had any opportunity to identify. There are high expectations of this in everything from medicine and biology to climate change research, but digitalisation is also taking place at a furious pace in other fields.

“The field is established in physics and MAX IV will produce huge quantities of data. Medicine has a number of examples of register research, biobanks and imaging, i.e. creating images from data. However, the social sciences also have register research that is at the forefront in the field and the humanities will soon be there, for example with data from archaeological excavations and empirical data from the 6 000 plus languages of the world”, says Sven Strömqvist.

In the public debate, the educational potential of big data is often emphasised. The opportunity to reproduce huge quantities of data in pictures could change the world. Many big data visions claim that if complicated connections are visualised so that we understand them in a new way, we will make better decisions. These visions come from both the research community and the commercial sector, where companies like Google, Facebook and Twitter generate incredible amounts of data about people.

However, big data is simultaneously a reality and rhetoric. Alongside the opportunities, the large amounts of data also present challenges, including for universities. Big data cannot be dealt with using traditional database methods. It requires advanced analysis software, powerful processors and knowledgeable staff.

“It is important that we take a more thorough approach to the mathematical developments that are necessary for a new generation of efficient information technology that can deal with big data. A step in this direction is the mathematics platform that Magnus Fontes has set up [see separate article]”, says Sven Strömqvist.

Openness and collaboration across disciplinary boundaries are also required if big data is to produce dividends. The data must be standardised, stored and made available globally. Both the EU’s Horizon 2020 programme and the Swedish Research Council point out the importance of open data and interdisciplinary collaboration in new proposals and calls for applications.

The security aspects – the lack of secrecy and the threat of a Big Brother society – are other important issues, as well as the reliability of findings that are based on correlations rather than empirically proven causal relationships.

Britta Collberg

This is big data

WHAT: Digitally stored information that is so large (usually measured in terabytes and petabytes) that it is difficult to process using traditional methods.

1.5 petabyteS = the size of ten billion photos on Facebook

2.5 petabyteS = the amount of information/memories that a human brain can store

20 petabytes = the amount of data processed every day by Google (figure from 2012)

WHERE: Generated within meteorology, bioinformatics, genomics, physics, environmental research, retail, advanced simulations, defence and communication services with many users, such as mobile telephony and web services such as YouTube, Flickr, Twitter, Facebook and Google.

Buzzword: Big data really emerged as a concept in 2009.

Hype: Critics see big data as a hyped-up concept to attract more funding in research or increase profits in business.

Paradigm SHIFT: E-science based on big data is a new research field that aims to find patterns in large quantities of data. The patterns do not show causal relationships, but rather correlations, which can shed light on problems that are difficult to solve. The most enthusiastic supporters claim that big data revolutionises research and entails ‘the death of theory’. Many people are talking about a paradigm shift and a golden age, at the same time claiming that big data complements rather than replaces traditional empirical research.

Read more about big data