1Department of Statistics, University of Colombo, Colombo 03, Sri Lanka
2School of Computing, Engineering and Mathematics, University of Western Sydney, Campbelltown, Australia
American Journal of Applied Mathematics and Statistics.
2022,
Vol. 10 No. 1, 22-27
DOI: 10.12691/ajams-10-1-4
Copyright © 2022 Science and Education PublishingCite this paper: K.A.D. Deshani, Liwan Liyanage-Hansen, Dilhari T. Attygalle. Clustering Time Related Data: A Regression Tree Approach.
American Journal of Applied Mathematics and Statistics. 2022; 10(1):22-27. doi: 10.12691/ajams-10-1-4.
Correspondence to: K.A.D. Deshani, Department of Statistics, University of Colombo, Colombo 03, Sri Lanka. Email:
deshani@stat.cmb.ac.lkAbstract
With the advancement of technology, vast time related databases are created from a plethora of processes. Analyzing such data can be very useful, but due to the large volumes and their relevance to time, extracting useful information and implementing models can be very complex and time consuming. However, using a comprehensive exploratory study to extract hidden features of the data can mitigate this complexity to a great extent. The clustering approach is one such way to extract features but can be demanding with time related data, especially with a trend in the data series. This paper proposes an algorithm, based on regression tree approach, to cluster a time series with a trend, along with other relevant variables. The importance of this algorithm is avoiding the misleading cluster allocations that can be created through clustering a differenced time series. Initially it identifies a suitable consistent time window with no trend, and implements separate regression trees for each window, to obtain the clusters. Through exploring the clusters generated from these trees, a general cluster formation is identified suitable for all windows. This is illustrated using hourly electricity demand in Sri Lanka for five consecutive years. Six meaningful clusters were identified based on the day of the week, specialty, and the time of the day. These cluster memberships provide useful additional information on the data structure, independent of the trend component, and can be used as an additional feature for improving model accuracies.
Keywords