We are inviting applications for a PhD studentship in statistical epidemiology, funded by GlaxoSmithKline for 3 years, starting at a mutually convenient date. The studentship is open to EU citizens and covers tuition fees and a generous living allowance. The project concerns statistical methods in pharmacoepidemiology, using large primary care data bases, such as the General Practice Research Database, as described in the project outline below. The project supervisor is James Carpenter, and the successful candidate will be collaborating with statisticians and epidemiologists at both LSHTM and GSK to develop a stimulating doctoral program of research. Applicants must have an MSc (or equivalent) in statistics. Some knowledge of epidemiology is desirable but not essential: an aptitude for applied methodological research in statistics is more important. Applications, including a CV and the names of two referees, should be sent to Dr James Carpenter, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT (email [log in to unmask]), from whom further particulars can be obtained. For an informal discussion telephone James (020 7927 2033) or Professor Stuart Pocock (020 7927 2413). The closing date for applications is Friday 20 August 2004. Project outline: Statistical methods in pharmaco-epidemiology using large general practice databases This project concerns design issues and statistical methods in pharmaco-epidemiology, i.e. the study of the use and effects of drug treatment in populations, both as regards effectiveness and safety. Although this can be viewed as an application of epidemiologic methods to pharmaceuticals, the nature of the large routinely collected databases, which typically form the principal source of information, mean that this field poses challenges that often require special solutions in both study design and statistical analysis. Further, the complex and dynamic context of pharmaco-epidemiology gives rise to fascinating and unique statistical challenges. Such challenges need to be addressed if the information in such databases is to be reliably and routinely used. For instance, for any particular class of drugs (e.g. statins for reducing risk of coronary disease) it would be appropriate to investigate possible associations with several other diseases (e.g. Alzheimer's disease, eye cataract, suicide etc). The case-cohort design seems well suited to such problems by making use of all disease cases together with a random sub-sample of the whole cohort. Nested case-control studies for each disease might be a suitable alternative approach. The analysis of data from large routinely collected databases also presents unique challenges. Propensity scores and weighting methods have been proposed in other settings to reduce bias caused by non-random allocation of treatments. However, their application to large scale database analyses poses some special challenges, not least because of the considerable quantity of missing observations, which are inevitable in a large amount of routinely collected data. Subjects who have complete data on all exposures and confounders are likely to be both unrepresentative and a relatively small proportion of the total. Thus a conventional approach, such as using only individuals with complete data, is likely to be both biased and underpowered. We intend to pursue a program of methodological research to address these questions. In order to focus and enhance the practical relevance of this research it will be closely linked to and illustrated by specific potential drug-disease associations of interest to pharmaco-epidemiologists. The precise sequence of methodological issues to be tackled will unfold over time but specific potential topics of interest are as follows: 1. How do case-cohort and traditional case control designs compare, in terms of efficiency and likely costs?2. One problem with the case-cohort approach is the difficulty in taking account of general practice effects since there is no matching of cases to controls. How much does this apparent deficiency matter?3. What size of sub-cohort should be selected and how one should restrict its sampling frame to take account of the characteristics of disease cases?4. It has been suggested that repeat use of the same sub-cohort for multiple disease-drug inferences may have some statistical penalty re non-independence;5. The use of nested case control studies has the inefficiency of needing a separate selection of matched controls for each set of disease cases. However, one can easily match on practice and then the statistical analysis methods are better established. It will be valuable to compare results (and the effort to produce them) for nested case-control and case-cohort design for the same drug-disease questions;6. Development of methods for coping with missing data in the analysis, including the use of propensity scores and related approaches when data are missing. This would involve the exploration and development of multiple imputation methods that take into account (i) information on the distribution of exposures and confounders available in the wider epidemiological literature, and (ii) the hierarchical structure of the data (with patients nested in practices etc).7. Given the novel approaches required, the development of appropriate statistical software and experience in handling such large databases will be useful for future applications. Our methodological findings, backed up by experience in real-world pharmaco-epidemiological questions should have a major impact on how best to make use of such large databases as the GRPD for studying drug safety and effectiveness in a primary care setting. Stuart Pocock Medical Statistics Unit London School of Hygiene and Tropical Medicine Keppel Street London WC1E 7HTTel +44 (0)20 7927 2413 (direct) 2230 (secretary)Fax +44 (0)20 7637 2853