All the information you need to know about each project. You will be asked to select one project from each session. If you have any question related to the projects, please use this form to submit it: Questions Expand all Collapse all Projects Information Session Document SwDS Disertation (967.99 KB / PDF) Session 1 Anomaly Detection with Bayesian Neural Networks Lloyds Banking Group Summary Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a data set. It has a wide number of use cases in industry, including fraud detection, malicious content removal , etc. Anomaly detection is used in Lloyds for many of these applications, but cyber and other e-crime threat detection is of particular interest to make the bank as secure as possible for our customers and colleagues from criminal threats. Information Document Anomaly Detection (682.28 KB / PDF) Understanding forest damage in Germany: Finding key drivers to help with future forest conversion of climate sensitive stands Department of Soil and Environment, Forest Research Institute Baden-Wuerttemberg, Freiburg, Germany Summary In this project you will be analysing yearly forest health monitoring data of the main species made available by the Forest Research Institute Baden-Wuerttemberg in order to identify the site characteristics (topography, soil, water budget and climate) which are associated with damage. The aim is to describe the nature of the associations and ranges of variables for optimal conditions. The outcome of the analysis will help to formulate hypothesis regarding the causes of damage. Information Document Understanding Forest Damage (689.08 KB / PDF) Understanding the drivers of behind future demand for hospital services in Scotland and to support the development of wider whole system modelling Public Health Scotland Summary Public Health Scotland is developing a number of models to represent the health and social care system. The goal of these models is to combine them over time to create a whole system view. This will be used to support the long term planning of services and workforce and to understand the long term demands on the system that the population will require. Information Document Whole system Modelling (499 KB / PDF) Reselling second-hand items: Identifying the likelihood of stuff being sold online Thrift Summary Allowing the masses to easily sell their unwanted or unused items not only helps reduce the huge amount of waste, which is thrown in landfill each year, but also lets people liquidate cash that is tied up in ‘stuff’ they do not need and provides economic opportunity to those that are less financially fortunate. The complexity of the market offers us an opportunity to simplify things for the people. THRIFT collects data from eBay, Depop, Facebook Marketplace, Amazon, and Etsy – our proprietary algorithm then makes sense of that data to give the user a market price for any item. Information Document Reselling second hand items (642.36 KB / PDF) Session 2 How does DNA sequence specify gene expression in a timecourse of fungal growth? University of Edinburgh – Cell Biology Summary Cells contain thousands of genes, whose "expression" dynamically changes to ensure that they have the right amounts of the proteins that they need to thrive and grow in a dynamic environment. Each gene can be thought of as a region of DNA sequence (modelled as a string of A,C,G,T's), some of which is transcribed into RNA, and then some of this messenger RNA is translated into protein. A fundamental task in molecular biology is to understand how gene sequence specifies the amounts of RNA and protein in any given condition. Statistically, this can be thought of as a high-dimensional model fitting problem where the input is strings and the output is numerical variables or timecourses of gene expression; since it is high-dimensional, dimension reduction methods such as feature selection and clustering are crucial [Eisen 1998]. This project asks what sequence features best explain gene expression patterns in the fungal pathogen Cryptococcus neoformans as it "wakes up", resuming growth after a period of quiescence. Cryptococcus is a major pathogen of immunocompromised humans [May 2016], and the goal of this experiment is to understand the initial stages of infection when the cells "wake up" in a human lung. Information Document Fungal Growth (109.01 KB / PDF) Hidden Markov models as tools to identify seabird foraging areas Joint Nature Conservation Committee Summary Seabird populations are in decline worldwide (Paleczny et al. 2015). To mitigate seabird population declines, governments have designated marine protected areas for seabirds, and have regulated the development of offshore renewables in order to limit their ornithological impacts. This relies on the identification of important foraging areas. Researchers typically identify foraging areas by applying hidden Markov models (HMMs) to GPS data, which track the behaviour of individuals, using the ‘moveHMM’ R package (Michelot et al. 2016). Little has been done to validate these models as data on known foraging locations are difficult to collect. Information Document Hidden Markov Models (625.56 KB / PDF) Neural De-Duplication and Record Linkage Amazon Summary Record Linkage or Entity Resolution is the task of grouping entities across one or more data sources. It is commonly used for improving data quality and integrity, to allow re-use of existing data sources for new studies, and to reduce cost and efforts in data acquisition. Applications include matching bibliographic and e-commerce data entities, matching business records and predicting duplicate ads. Within Amazon, these methods are used to find relationships between products and the relationships are used to drive search and discoverability of products, and to improve the shopping experience on the website. Information Document Record Linkage (256.64 KB / PDF) Pisa 2015 and 2018: Comparison of educational attainment between England, Scotland, Wales and Northern Ireland PISA (Programme for International Student Assessment) Summary PISA (Programme for International Student Assessment) is the Organisation for Economic Co-operation and Development’s (OECD) programme for comparing educational attainment in 15-year-old students all over the world. Exams in reading, mathematics and science knowledge and skills to meet real-life challenges take place every three years in 79 countries (as of 2018). In addition to the exams, the students, teachers, and parents complete questionnaires to capture aspects of the students’ backgrounds and school environments that may be related to educational outcomes. The PISA study uses systematic probability proportional to size (PPS) sampling. Schools are assigned to strata based on school characteristics. Information Document PISA (467.24 KB / PDF) This article was published on 2025-02-26