Overview
This document contains a list of possible honours projects in financial text mining with the Capital Markets Cooperative Research Centre (CMCRC).
The Capital Markets Cooperative Research Centre
CMCRC is a $100 million facility backed by The Australian Federal Government with a track record of commercial success in developing capital markets technologies. In addition to finance and information technology researchers from 6 universities, the CMCRC consortium consists of 21 industry partners including securities exchanges and related technology and data providers.
http://www.cmcrc.com
The Projects
The proposed projects are listed here. All projects will be carried out in conjunction with the CMCRC and could involve CMCRC industrial partners. Furthermore, all projects may attract scholarships of $5000. Keep an eye on this page for updates concerning partners and scholarships associated to particular projects.
Supervision
Possible supervisors are listed here. Each project will have two Macqurie staff as official supervisors. In addition, they will have a third supervisor from the CMCRC.
Contact
Feel free to email with any questions:
- Ben Hachey <bhachey AT cmcrc DOT com>
- Jean-Yves Delort <jydelort AT cmcrc DOT com>
- Diego Mollá Aliod <diego DOT molla-aliod AT mq DOT edu DOT au>
Aggregation and Presentation for Trading and Surveillance
Explaining price-sensitivity based on classifier decisions Text classification technology provides a means for predicting whether a document is price-sensitive (i.e., whether it will impact stock prices). However, it does not provide a means for verification of system predictions. The goal of this project is to use sentence extraction techniques from the summarisation literature to provide a brief explanation of classifier output so that a human analyst can quickly verify whether a prediction is sound before acting on it.
Explaining finance relations In capital markets, it is easy to identify associations between companies by testing whether their share prices tend to move together. However, this information is not necessarily useful to a trader or surveillance analyst unless they can identify a causal or co-effect relation between the two companies. This project will explore techniques for aggregating and presenting relation type information from various structured and unstructured sources (e.g., industry classifications, index membership, textual descriptions in news or other documents).
Biographical sketches for finance entities In addition to interpreting relations between entities like companies and people, an analyst needs to be able to find out quickly who the entities are and what they do. This project will explore the use of techniques from the summarisation literature for aggregating and presenting biographical information about finance entities from various sources (e.g., company announcements, news, forums).
Improving Commercial Information Retrieval for E-mail
Conceptual search Conceptual search aims to improve the user experience by organising search results by topic or concept. This is commonly done by automatic clustering or by grouping results with respect to a pre-determined taxonomy. The goal of this project is to add conceptual search to an existing commercial search engine. The work will be carried out in collaboration with Nuix, who will provide the data set. This project has a scholarship attached of $5000.
http://www.nuix.com/
Explaining e-mail relations Current search suites provide tools for visualising email networks for an organisation. However, it does not incorporate a description of the type of relationship that exists between email partners, which would make the network easier to interpret and allow filtering by relation types. The goal of this project is to develop a system for automatically identifying relation types for email partners. The work will either use the Enron email data set or it will be carried out in collaboration with an industry partner.
Tools for Assisted Curation of Financial Databases
Extracting board members from company announcements Current financial information providers often pay human analysts to read company announcements and extract information such as board membership and top shareholders. The goal of this project will be to induce an automatic system to perform these tasks. The work will either use existing annotated data at the CMCRC or it will be carried out in collaboration with an industry partner.
Extracting profit/loss information from company announcements Financial information providers also curate databases of profit/loss information. The goal of this project will be to induce an automatic system to extract and normalise this information (e.g.., forecasted and actual earnings figures). The work will either use existing annotated data at the CMCRC or it will be carried out in collaboration with an industry partner.
General Financial Information Extraction Problems
Inducing name/term identification from meta data The Reuters News Archive (RNA) is a large corpus of newswire data that is richly annotated at the document level with meta data such as company names. The goal of this project is to induce a system to automatically identify such terms by mapping entities in the human-authored meta data to actual character strings in the raw text.
Matching company names using learnt similarity measures Company names can be referenced in various ways in text (e.g., BHP, BHP Billiton, BHP Billiton Limited, BHP Ltd). While named entity recognition technology is capable of identifying these mentions with high accuracy, it does not always result in a direct match to registered company names associated to stock ticker codes. The goal of this project is to build a system that automatically matches different reference to the same company. An interesting approach is to use machine learning to train a specialised string similarity measure.
Analysis of sentiment classification and market behaviour Sentiment analysis aims to automatically determine whether a text (e.g., from company announcements, news, forums) is favourably disposed towards its subject. The goal of this project is to investigate the relationship between sentiment engine output and market behaviour.
CMCRC-Macquarie Supervision
Dr Jean-Yves Delort, Research Fellow Jean-Yves Delort has a PhD in Computer Science from the University of Pierre and Marie Curie (Paris 6). Before joining the CMCRC, Jean-Yves was Senior Lecturer in Computer Science at the University of Montpellier and was a member of the research group on Hypermedia and Human Computer Interaction. Jean Yves’ core interests are in information retrieval and human-computer interaction with a focus on automatic summarisation and visualisation. Jean-Yves will serve as a first or second supervisor on the projects listed here. http://web.science.mq.edu.au/~jydelort/
Dr Ben Hachey, Research Fellow Ben Hachey has a PhD in Informatics from the University of Edinburgh, where he was a member of the Language Technology Group for 6 years as a Research Associate and then a student. Ben’s core interests are in building usable text analytics tools with a focus on information extraction, automatic summarisation and information aggregation, minimally supervised machine learning and evaluation. Ben will serve as a first or second supervisor on the projects listed here. http://web.science.mq.edu.au/~bhachey/
Macquarie Supervision
Dr Diego Mollá Aliod, Senior Lecturer Diego Mollá-Aliod has a PhD in Linguistics from the University of Edinburgh. Diego’s interests are centered on the application of theoretical linguistics to specific real-world problems, in particular to automated text-based question answering. Diego may serve as a first or second supervisor on the projects listed here. http://web.science.mq.edu.au/~diego/
CMCRC Project Coordination
Dr Maria Milosavljevic, CTO Maria Milosavljevic has 20 years of experience in language technology and knowledge-based systems. She was awarded a PhD in Language Technology from the Microsoft Research Institute at Macquarie University. Prior to joining CMCRC, Maria has held research roles at CSIRO, the University of Edinburgh and Macquarie University. Maria's main area of interest is in the use of text analytics in the fraud detection and intelligence area. Maria will serve as a third supervisor on the projects listed here. http://web.science.mq.edu.au/~mariam/
|