Supporting Big Data research in Azerbaijan

Huge amounts of information are gathered every day about things as diverse as climate change and earthquakes. As our ability to collect information advances, the amount, complexity and speed of data generated also increases. The continuing growth of data creates difficulties for researchers in data storage, management and analysis. For that reason, ‘Big Data’ has become one of the current and future research frontiers.

In Azerbaijan, e-governance is the current focus of Big Data work by Ramiz Aliguliyev, who is a head of department at the Institute of Information Technology (IIT) and Corresponding Member of the Azerbaijan National Academy of Sciences (ANAS). He explains how he deals with challenges he has faced with the support of the EU-funded EaPConnect project, the GÉANT pan-European network, and the Azerbaijanian research and education network AzScienceNet.

What is Big Data?

“Big Data is a term that describes the large volume of data – both structured and unstructured – and according to a 2012 definition by Gartner, Big Data is high-volume, high-velocity, and / or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimisation. The ‘3Vs model’ (volume, variety and velocity) describing the three main characteristics of big data was introduced by Gartner analyst Doug Laney in 2001.

There is no defined upper limit to set a threshold for data to be considered as ‘Big Data’, because, these days data is generated in various amounts and ways. According to IBM, 90% of data in the world has been created in just the past two years. The amount of data has increased exponentially, and that is why it is difficult to set a limit.”

What kinds of challenges does Big Data bring?

“It is difficult or almost impossible to collect, store, analyse, and visualise all Big Data using existing technologies, methods, and algorithms. Presently, the amount of data is so surplus that it is not possible to collect them on a computer, even when the computer capacity is high – ‘traditional’ methods are not enough. The main problems that arise while processing Big Data are storage and velocity. The size of the data increases at great speed, no matter how much you modify the algorithms, the power of a personal computer is not enough. Even if the power of the computing resources is enough, a problem with storage is inevitable. There are two possibilities – to classify the traditional methods themselves or to devise a new method that can be solved on a computer. Two different algorithms are performed on the same platform, on the same machine to test the effectiveness of the methods, and then comparisons can be made. In this case, traditional methods no longer work because of storage problems. For that reason, it is more advantageous to use the capabilities and power of data centres.”

What is the best method to analyse Big Data?

“The best way to analyse data collected in all areas of science is ‘clustering’. Clustering is one of the essential data mining tools for Big Data analysis and is a data analysis technique that is commonly used in many fields. It is grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. It is not possible to determine to which class the initially-generated data belongs. After data collection, it is analysed with clustering methods, divided into groups and interpreted.

As clustering algorithms come with high computational costs, the question is how to cope with this problem and how to deploy clustering techniques and get the results within a reasonable time. Clustering has become more common as more data is generated and traditional methods have failed. Also, with the advent of the Internet, the size of data has already created a huge boom.”

What are some of the benefits of Big Data analysis?

“The generation of Big Data is happening in all segments of society from all kinds of sources. One of the tools that gives the best results in analysis is data mining, including special clustering methods. Also, text mining has a great advantage, for example in fighting Web spam. Text mining, also referred to as text analysis, is now applied by users, governmental organisations, research institutions and business companies. Text mining involves the automatic extraction of high-quality information from written resources and enables researchers to find more information in a faster and more efficient way. The most widespread technology analyses the press publications of any country or the world in, for example, the last month, to understand patterns and trends in the most-published or most-read topics, who read them, the comments etc. By this method, it is possible to define opinions on particular topics. With this kind of analysis, the main goal is to prevent Internet crimes and fraud.

Web spam has now generated huge ‘Big Data’. Web spam is low-value content that is created to improve search engine rankings and is considered Web data – data that is generated when the Internet is accessed from a computer. Currently, we are conducting research on the development of methods and algorithms to combat Web spam. In the fight against Web spam, it is usually investigated whether or not they are related to a website and, with the application of data mining technologies to large amounts of Web data, it is intended to improve Web services.”

What will your e-governance research achieve?

“E-government is a huge environment and always evolving. In an e-government environment, all management is done electronically. Comments written in the virtual environment, information taken and information about services received are collected and analysed. Currently, I am conducting research on e-government analysis with the application of text mining and social networks techniques. This research focuses on methods to identify harmful social groups that may operate in that environment. The main objective is to analyse data collected in social networks, detect hotspot information, improve the efficiency of e-governance and increase citizen satisfaction. Governments can access vast amounts of relevant information important to their daily functions. It allows them to measure citizen satisfaction and dissatisfaction, make faster decisions and to monitor those decisions and quickly enact changes if necessary.”

How has AzScienceNet played a role?

“It is impossible to resolve the amount of Data created today with yesterday’s resources. AzScienceNet has a special role in resolving these challenges. In recent years, with the support of GÉANT and the EaPConnect project, reconstruction works in AzScienceNet played a significant role in making our research work more effective. Previously, storage problems were emerging. Even though it was possible to process data in parts, it would take days to resolve those issues. Thanks to GÉANT and EaPConnect, using AzScienceNet’s opportunities, it is possible to resolve these problems in reasonable time. If there is technology, scientific methods, Big Data can be analysed and new knowledge can be gained. AzScienceNet is of great importance for Azerbaijani science and education. Scholars will continue to benefit from opportunities provided by GÉANT and EaPConnect through AzScienceNet.”

Further information

Ramiz Aliguliyev’s “Analysis of Big data collected in the e-government environment to improve the e-governance” project is funded by the Science Development Foundation under the President of the Republic of Azerbaijan. He is also currently working on a project funded by the State Oil Company of the Azerbaijan Republic (SOCAR), “Analysis of Big Data collected in the oil and gas sector”.