The EUBIROD Network

1^st Annual EUBIROD meeting

Dasman Center for Research and Treatment of Diabetes, Kuwait City

Kuwait City, Kuwait, 2^nd-4^th May 2009

Distributed Statistical Analysis Software

F.Carinci, Serectrix, Italy

First BIRO Academy Residential Course, Kuwait City, Kuwait, 2nd May 2009

Sustainable solutions for the routine provision of strategic data across Europe require highly collaborative frameworks. The BIRO approach is based on a collectively agreed application of two consecutive data processing steps, locally and centrally, each one involving key statistical procedures. A “Statistical Engine” is specifically required to derive aggregate tables from databases held at the regional level that will be sent towards a central BIRO server.

In this presentation, Fabrizio Carinci, a senior biostatistician leading the development of the statistical engine for Serectrix, describes how specialised software has been built to run the same routines in each partner's region, by exporting local data to a standardized database, formatted according to common criteria, that is further processed by R software.

Fabrizio explains how the method can be used to implement and disseminate the use of advanced statistical methods for the production of diabetes indicators and the analysis of population-based data. Open source statistical software is today extremely powerful and may allow users replicating and further extending the approach for their own purposes.

The R software has been adopted as a development platform for all BIRO statistical software. A statistical engine connects to the local database using R Postgres drivers. Through the notion of “statistical object”, tables are created to store aggregates of local data (e.g. the arithmetic mean, percentile, variance, etc.) as flat text comma delimited files. A taxonomy defines all objects being implemented. The BIRO report template has been used as a guide for data processing and consequent transfer to the central server, where the central statistical component runs the overall analysis for the delivery of the global report.

Dr.Carinci explains how the BIRO statistical and central engines have been successfully developed to work on different platforms, being tested on both Vista and Linux. In terms of performance, average hardware allowed completing a full local BIRO report from a test sample of more than 5,000 patients and several thousands episodes in about 7 minutes. The central engine, using aggregate data from N=5 centres, corresponding to over 43,000 subjects and 273,000 episodes, completed the entire process of statistical analysis and production of a full overall report in 22 minutes. Installation of the software is identical regardless of the hardware, requiring R>1.8, Latex, Java 6.0 and PostgreSQL, plus various additional libraries/packages included in the distribution packages. All R functions are released under the GPL license and made available to partners of the Consortium bundled with all other BIRO components.

In his conclusions, Fabrizio highlights that the statistical engine provides a platform for accurate benchmarking that currently does not exist in its innovative form at the point of health care provision. The system may serve multiple users, from the European Union, to the local physician. Users, once inducted to using the software, can apply it independently and submit better aggregate data to the central server, safeguarding sensitive data as a result of the application of rigorous rules set by the BIRO privacy impact assessment. The free availability of a modern statistical component can help disseminating the BIRO approach across Europe.