Article citationsMore >>

Deshani K.A.D, Liyanage-Hansen L. and Attygalle D. (2019). Artificial Neural Network for Dynamic Iterative Forecasting: Forecasting Hourly Electricity Demand, American Journal of Applied Mathematics and Statistics, Vol. 7, No. 1, January 2019.

has been cited by the following article:

Article

Clustering Time Related Data: A Regression Tree Approach

1Department of Statistics, University of Colombo, Colombo 03, Sri Lanka

2School of Computing, Engineering and Mathematics, University of Western Sydney, Campbelltown, Australia


American Journal of Applied Mathematics and Statistics. 2022, Vol. 10 No. 1, 22-27
DOI: 10.12691/ajams-10-1-4
Copyright © 2022 Science and Education Publishing

Cite this paper:
K.A.D. Deshani, Liwan Liyanage-Hansen, Dilhari T. Attygalle. Clustering Time Related Data: A Regression Tree Approach. American Journal of Applied Mathematics and Statistics. 2022; 10(1):22-27. doi: 10.12691/ajams-10-1-4.

Correspondence to: K.A.D.  Deshani, Department of Statistics, University of Colombo, Colombo 03, Sri Lanka. Email: deshani@stat.cmb.ac.lk

Abstract

With the advancement of technology, vast time related databases are created from a plethora of processes. Analyzing such data can be very useful, but due to the large volumes and their relevance to time, extracting useful information and implementing models can be very complex and time consuming. However, using a comprehensive exploratory study to extract hidden features of the data can mitigate this complexity to a great extent. The clustering approach is one such way to extract features but can be demanding with time related data, especially with a trend in the data series. This paper proposes an algorithm, based on regression tree approach, to cluster a time series with a trend, along with other relevant variables. The importance of this algorithm is avoiding the misleading cluster allocations that can be created through clustering a differenced time series. Initially it identifies a suitable consistent time window with no trend, and implements separate regression trees for each window, to obtain the clusters. Through exploring the clusters generated from these trees, a general cluster formation is identified suitable for all windows. This is illustrated using hourly electricity demand in Sri Lanka for five consecutive years. Six meaningful clusters were identified based on the day of the week, specialty, and the time of the day. These cluster memberships provide useful additional information on the data structure, independent of the trend component, and can be used as an additional feature for improving model accuracies.

Keywords