cmcrc_logo
Chao Luo

University: University of Technology, Sydney
Supervisor: N/A

Research Question:

  • How to use data mining technology to assist stock market surveillance? 

Data mining is the process of analysing data from different perspectives and summarizing it into useful information. Information can be converted into knowledge about historical patterns and future trends. In general, data mining involves the following tasks: classification, clustering, association rules, outlier mining, sequential pattern, and so on. Market surveillance is critical to design market models and business rules, as well as maintain market integrity, transparency and fairness.

Research Motivation:

ChaoLuoThe existing market surveillance systems usually rely on surveillance rules for alerting of suspect findings in the market. Most of the surveillance rules are predefined and based on business rules, while some of the surveillance rules may come from statistics and reporting results which can capture more sophisticated abnormal trading behaviour and market movement. Therefore, there is a crucial need to develop workable methods for smart surveillance. To deal with such issues, we research on using some advanced technologies, such as data mining, to assist stock market surveillance.

High-level Research Design:

In this research, we explore the potential application of outlier mining, multiple time series analysis and subspace clustering in market surveillance.

The objective of outlier mining is to find the data objects which are grossly different from or inconsistent with the majority of data. Outlier mining technologies have been used to detect market manipulation and insider trading. In financial market data, outliers are highly intermixed with normal data and it is difficult to judge whether an object is an outlier or not. Therefore, a more effective and more efficient approach is in demand.

There are multiple measures in stock market, which are price, volume, volatility, and so on, and each measure makes a time series. We proposed two multiple time series model, one is voting-based outlier mining on multiple time series and another is probability-based outlier mining on multiple time series.

Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. A cluster is intended to group objects that are related, based on observations of their attribute's values. However, given a large number of attributes some of the attributes will usually not be meaningful for a given cluster. Subspace clustering is the task of detecting all clusters in all subspaces. This means that a point might be a member of multiple clusters, each existing in a different subspace.