Methodology that will be used in the project includes the following groups of methods:

(a) machine learning methods for clustering, prediction, classification and pattern recognition,
(b) financial methods of assessing financial compensation and long-term financial effects,
(c) methods of simulation modelling and mapping the flow of business processes.

The project will use methods of machine learning that become more important in science lately for their advantages over standard statistical methods. They are also very actual with the development of data processing in cloud and Big Data concept, which emphasizes methods of data analytics, especially predictive analytics, including machine learning methods. (Kelleher, 2015; Zekić-Sušac, Has, 2016). In this project, machine learning will employ clustering as the first stage of modelling with the key indirect objective of selecting possible features of data and defining vectors of data, in order to dected characteristic groups of buildings according to their energy state and behavior of energy consumption. Neural networks, decision trees, support vector machines, association rules,and other machine learning methods will be used in the second-stage modelling to identify important predictors of energy consumption, efficiency, cost, and payback period. Besides identifying main predictors, the sensitivity analysis will be conducted to investigate the strenght of their influence to output (energy efficiency, costs, and energy consumption and payback period). Also, some patterns in energy consumption behavior of buildings in the public sector will be investigated by machine learning methods for pattern analysis (Akinlar, 2013; Scitovski, 2014). It will be investigated which groups of buildings in the public sector follow similar patterns in energy consumption, and how those behaviour patterns influence the financial effects of their consumption.
Machine learning methods are methods that are able to discover relationships between variables by using various algorithms that aim to find similarities among data (similarity-based learning), or minimize the learning error (error-based learning) or compute probability of some events (probability-based learning) (Kelleher, 2015). According to (Raschka, 2016) algorithms of machine learning include supervised learning, unsupervised learning and reinforcement learning. The most frequent methods of machine learning are clustering (or grouping) of data (Theodoridis, 2009; Bagirov, 2011; Morales-Esteban i dr., 2014; Kumar, 2015; Ordin, 2015), artificial neural networks, decision trees, association rules, support vector machines, genetic algorithms, Naive Bayesov classifier, k-nearest neighbour method and others. Among clustering methods, the project will use grouping in k clusters according to more features, where the clusters will be selected by using the least square method of least absolute deviations. The selection of the most appropriate number of clusters will be conducted by several indicators, such as objective function value indicator, Calinski–Harabasz index and Davies–Bouldin index. It is assumed that the vectors will have a high dimension n (high dimensional data problem) and that there will be m such vectors, where m will be a large number (large high dimensional data problem) (Kumar, 2015). The original data will be first normalized, and for searching the optimal partition of elypsoidal clusters the Adaptive Mahalanobis Clustering will be used (Morales-Esteban i dr., 2014), while for searching the banana clusters the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) will be more appropriate (Ester, 1996; Karami, 2014).

Artificial neural networks can be described as programs of devices that are able to find relationships among variables by learning on past data, with the aim to produce the output for the new input data (Masters, 1995). Although the first suggested as a concept in 1950s, their frequent usage in research begins when Paul Werbos designed the backpropagation algorithm in 1982. In this project, several types and algorithms of neural networks will be used, including multilayer perceptron, and the radial-basis function network with backpropagation, conjugate gradient, Broyden-Fletcher-Gordfarb-Shanno and other algorithms (Zekić-Sušac et al., 2010). Support vector machines method is developed on the basis of Generalized Portrait algorithm suggested in 1960s in Russia, and on the basis of VC (Vapnik-Chervonenkis) theory (Smola and Schoelkopf, 2004). This method is primarily aimed for classification problems, although there is a modified version called Support Vector Regression adjusted for regression prediction problems. Since its first usage on the optical character recognition, this method has been used in many domains (Papadimitriou, 2016).
Decision trees i.e. classification trees are frequently used methods in datamining, with the aim to build a binary tree by splitting the input vectors at each node according to a function of a single input. The two most popular algorithms are discriminant-based univariate splits, and classification and regression trees (CART or C&RT) (Lee, 2010). The k-nearest neighbour techique is used to classify the outcome of an input vector based on a selected number of its nearest neighbours. For a given input vector, the method estimates the outcome by finding k examples that are closest in distance to the input (i.e. its neighbours). The most common distance measure is the Euclidean, while others possible metrics are Euclidean squared, City-block, and Chebychev distances (Bishop, 2006).
In order to realize the objective of assessing financial compensation and long-term financial effects, the classical methods of net present value (NPV) and internal revenue rate (IRR) will be used, as well as models of American Environmental Protection Agency from the Energy Star program. More concrete, models of calculating cash flow, evaluation of building reconstructions and financial quantification of increasing energy efficiency. Besides, some additional methods of assesment of curves of long-term financial effects will be investigated.

For the purpose of mapping the supply chain of natural gas, the methods of Business Process Modelling Language, Value Stream Mapping, method of causal maps and/or method of Ishikava diagram will be used. One of the famous mapping methods that will be used according to model of Barosso et al. (2011) will be also investigated. The mapping process starts with the creation of visual and descriptive view of main supply chain dimensions: entities, relationships among entities in the supply chain, material flows, information flows, management policies and lead times. Therefore, by mapping the current state in supply chain, the basic processes that need to be taken in account are presented and the improvements of elasticity of supply chain to deviations (internal or external) are investigated. Then the detection of potential disorders in supply chain follows, which are major obstacle in its further optimization. The next phase is the selection of strategies for reducing the problems caused by disorders is conducted. Those strategies will be based on coordinated and collaborative predictions of natural gas consumption based on the data analytics methods and models developed in previous phases. After that, a new mapping follows by incorporating the consequences of each suggested strategy for supply chain performance improvement, as well as the report for each scenario of supply chain mapping. In addition, on the basis of actual data and the data from simulation model, the method of dynamical programming will be used to extract and explain some energetic disbalances in the observed suply chain (Villada, Olaya, 2013; Zhang, 2016).
By statistical tests of comparison of accuracy (t-test, McNemar and others) the selection of the best method for modelling energy efficiency for public buildings will be performed. Also, the integration of several methods will be suggested for the purpose of higher efficiency and cost reduction.

[/vc_column_text][title title=”Topicality of methods”][vc_column_text]

The described methods that will be used in the project are very actual in scientific journals, their role and influence on recent cognitions become more important for discovering hidden relationships among data, i.e. data mining. Data mining recently becomes more important in accordance with the popularity of Big Data concept, where the learning from data is crucial for generating new knowledge (Raschka, 2016) in the framework of data analytics. The described methods will enable the realization of the project objectives and designing the proposal of the methodological framework for efficient energy management.

For conducting the above methodology, the project will use statistical software packages (R, Statistica), mathematical and simulation tools (Matematica, MathLab, Arena Simulation), and Big Data analytics (IBM Watson, Tableu). Those tools enable creating and imlementing new algorithms, and testing their improvements in relation to existing ones. After the models are created, there is a technical ability to implement them in the information system of an institution.