|
Page 2 of 3
Aggregation and Presentation for Trading and Surveillance
Explaining price-sensitivity based on classifier decisions Text classification technology provides a means for predicting whether a document is price-sensitive (i.e., whether it will impact stock prices). However, it does not provide a means for verification of system predictions. The goal of this project is to use sentence extraction techniques from the summarisation literature to provide a brief explanation of classifier output so that a human analyst can quickly verify whether a prediction is sound before acting on it.
Explaining finance relations In capital markets, it is easy to identify associations between companies by testing whether their share prices tend to move together. However, this information is not necessarily useful to a trader or surveillance analyst unless they can identify a causal or co-effect relation between the two companies. This project will explore techniques for aggregating and presenting relation type information from various structured and unstructured sources (e.g., industry classifications, index membership, textual descriptions in news or other documents).
Biographical sketches for finance entities In addition to interpreting relations between entities like companies and people, an analyst needs to be able to find out quickly who the entities are and what they do. This project will explore the use of techniques from the summarisation literature for aggregating and presenting biographical information about finance entities from various sources (e.g., company announcements, news, forums).
Improving Commercial Information Retrieval for E-mail
Conceptual search Conceptual search aims to improve the user experience by organising search results by topic or concept. This is commonly done by automatic clustering or by grouping results with respect to a pre-determined taxonomy. The goal of this project is to add conceptual search to an existing commercial search engine. The work will be carried out in collaboration with Nuix, who will provide the data set. This project has a scholarship attached of $5000.
http://www.nuix.com/
Explaining e-mail relations Current search suites provide tools for visualising email networks for an organisation. However, it does not incorporate a description of the type of relationship that exists between email partners, which would make the network easier to interpret and allow filtering by relation types. The goal of this project is to develop a system for automatically identifying relation types for email partners. The work will either use the Enron email data set or it will be carried out in collaboration with an industry partner.
Tools for Assisted Curation of Financial Databases
Extracting board members from company announcements Current financial information providers often pay human analysts to read company announcements and extract information such as board membership and top shareholders. The goal of this project will be to induce an automatic system to perform these tasks. The work will either use existing annotated data at the CMCRC or it will be carried out in collaboration with an industry partner.
Extracting profit/loss information from company announcements Financial information providers also curate databases of profit/loss information. The goal of this project will be to induce an automatic system to extract and normalise this information (e.g.., forecasted and actual earnings figures). The work will either use existing annotated data at the CMCRC or it will be carried out in collaboration with an industry partner.
General Financial Information Extraction Problems
Inducing name/term identification from meta data The Reuters News Archive (RNA) is a large corpus of newswire data that is richly annotated at the document level with meta data such as company names. The goal of this project is to induce a system to automatically identify such terms by mapping entities in the human-authored meta data to actual character strings in the raw text.
Matching company names using learnt similarity measures Company names can be referenced in various ways in text (e.g., BHP, BHP Billiton, BHP Billiton Limited, BHP Ltd). While named entity recognition technology is capable of identifying these mentions with high accuracy, it does not always result in a direct match to registered company names associated to stock ticker codes. The goal of this project is to build a system that automatically matches different reference to the same company. An interesting approach is to use machine learning to train a specialised string similarity measure.
Analysis of sentiment classification and market behaviour Sentiment analysis aims to automatically determine whether a text (e.g., from company announcements, news, forums) is favourably disposed towards its subject. The goal of this project is to investigate the relationship between sentiment engine output and market behaviour.
|