About me

Hi there, thanks for stopping by. I am a PhD candidate developing learning frameworks that aim to learn and adapt continuously in data streams. Specifically, I approach the problem in two different settings, namely the intra- and the inter-stream settings.

Under the intra-stream setting, I investigate methods to detect, anticipate, and maintain changing system behaviour, specifically concept drifts. The concept drift problem means the underlying data distribution may change over time, causing decay in the predictive ability of the machine learning models. I approach this problem by exploiting recurring concept drifts. That is, if a concept reappears, for example a particular weather pattern, previously learnt classifiers can be reused; thus the performance of the learning algorithm can be improved.

Under the inter-stream setting, I investigate strategies to transfer knowledge across data streams to improve the predictive performance at the inception of the data streams, when no recurring concepts have been observed, known as the cold start problem. I address a major gap in existing research by considering the cost-effectiveness of concept transfer in the online context, where processing time is crucial as data instances continuously arrive at high-speed.

Research Outputs

  • PEARL, a reactive recurrent concept drift classifier for intra-stream knowledge adaptation. PEARL uses a combination of exact and probabilistic methods in determining the transitions of the change in the data stream with efficient memory management.
  • Nacre, a proactive recurrent concept drift classifier built on top of PEARL, allowing smoother transitions when change in data distributions occurs intra-stream, by anticipating the next location of such change points. This ultimately increases accuracy in the classification performance.
  • AOTrAdaBoost, an inter-stream boosting technique that enables transfer between identical and partially identical domains. The proposed technique tunes the sensitivity of weighting during the boosting process such that the learning task can benefit from partially identical domains where noise and dissimilarities in data distributions may exist.
  • OPERA, a cost-effective transfer learning framework for inter-stream model adaptation. The framework decides whether a transferred model can be efficiently adapted to the target stream in terms of performance gains by measuring the cost of model adaptation. To measure the transfer cost, we propose the phantom tree algorithm, which measures the construction complexity of tree-based models. The phantom tree allows us to understand the future potential of the splits and decide whether the source model should be transferred and used across different streams or whether it is more cost-effective to build a new model in the target stream. Further, to aid in more effective adaptation, we developed an incremental patching algorithm where the classifiers identify the regions of the instance space in which adaptations are needed, known as error regions, and then train local classifiers for these regions.
  • scikit-ika, an open-source package consisting of efficient implementations of all the algorithms above to allow real-time learning and adapting continuously both intra- and inter-streams. The design of this package enables real-time learning and adaptation, and reuse of convenient data stream analysis related implementations for evaluations and application integration.

Publications

Ocean Wu, Yun Sing Koh, Gillian Dobbie, and Thomas Lacombe. Cost-effective transfer learning for data streams. In 2022 IEEE International Conference on Data Mining, pages 1233–1238, Dec 2022

Ocean Wu, Yun Sing Koh, Gillian Dobbie, and Thomas Lacombe. Probabilistic exact adaptive random forest for recurrent concepts in data streams. International Journal of Data Science and Analytics, 13(1):17–32, Jan 2022

Ocean Wu, Yun Sing Koh, Gillian Dobbie, and Thomas Lacombe. Transfer learning with adaptive online tradaboost for data streams. In Vineeth N. Balasubramanian and Ivor Tsang, Proceedings of The 13th Asian Conference on Machine Learning, volume 157 of Proceedings of Machine Learning Research, pages 1017–1032. PMLR, 17–19 Nov 2021

Ocean Wu, Yun Sing Koh, Gillian Dobbie, and Thomas Lacombe. Nacre: Proactive recurrent concept drift detection in data streams. In International Joint Conference on Neural Networks, pages 1–8, 2021

Ocean Wu, Yun Sing Koh, Gillian Dobbie, and Thomas Lacombe. PEARL: Probabilistic exact adaptive random forest with lossy counting for data streams. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 17–30. Springer, 2020

Ocean Wu, Yun Sing Koh, Giovanni Russello, GPU-based State Adaptive Random Forest for Evolving Data Streams, In International Joint Conference on Neural Networks, pages 1-8, 2020