Data Management Plan Sample 2
Data Sharing Plan
The Informatics and Data Management (IDM) Core of this Program Project, Core B, will plan, capture, manage and track research data that is generated by Program researchers. IDM Sub-cores (at Dartmouth and Arizona) will store information for the three Program Projects, with the exception of the NMR- and MS-generated profile data that will be stored primarily at Imperial College, London, UK, in the laboratory where it will be created. IDM will work with the Imperial College group to implement secure electronic access to the profile data, either by direct Open Database Connectivity (ODBC) connection to the data source or by electronic access to a view of the relevant data.
IDM will assist TA (Core D) to collect and track biospecimens for the Program. Biospecimens will be linked to participant consent data in the Program central database, as well as to other epidemiologic, clinical and research data, through a barcode assigned to each biospecimen.
Study data will be shared among the Program staff via controlled access to a central Program website developed and maintained by IDM. All authorized study personnel will be able to access project and core information, documentation and data reports through the Program web portal.
IDM will collect and manage a standardized description (metadata) of Program, Project and Core, procedures and data, as well as file locations for all data stored outside of the Program database. This catalog will allow the Program and NCI leadership to know at any time what research data has been collected and how to access that data. IDM staff will work with the Program investigators to encourage use of NCI- and NIH- standard data elements and formats in the generation and management of Program data, facilitating the sharing of Program raw and processed/ analyzed data.
There will be three final datasets for this Program, one associated with each Project. These final datasets will include demographic, epidemiologic and laboratory data, as well as analysis data files. All data will be stored with personal identifiers stripped from the body of data (records linked to the study participant via a study id entirely unrelated to any patient information), with a link to the identifying data that will be stored separately. We will make the data and associated documentation available to interested groups or investigators outside of the Program after the publication of the primary study paper, after agreement among the Program investigators to release the data, and only under a data-sharing agreement that provides for: (1) a commitment to use the data only for research purposes; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.
Overall Environment
The proposed Program Project Core B and Project I (Clinical Trial) will be implemented on the robust technical infrastructure already in place at Dartmouth. The BioInformatics Service Center (BSC) hardware, network, and security infrastructure includes development, test, and production environments for all of our system layers, including database and web. The BioInformatics Service Center maintains a bank of thirty-six HP Proliant and Dell PowerEdge servers. The servers run the following operating systems, four run VMWare ESX 3.5, five run Ubuntu Server LTS 6.06 Linux, one runs Ubuntu 7.10, two run Debian GNU/Linux 5.0, four run Windows 2000 Server and 20 run Windows 2003 Server R2. The primary networking OS is Microsoft Networking using Microsoft’s Active Directory on a TCP/IP network. The servers use RAID protected disk storage systems. Servers are attached to UPS battery backups for power protection. Backup consists of a multi-layered backup system incorporating disk-to-disk-to-tape backup and recovery. A secure off-site storage and rotation plan is used for tape backups. Internet communication with server-based systems requires encrypted connections. A network monitoring and alerting system using Dartware InterMapper monitors servers and network devices in real-time to provide traffic, errors, utilization and outage information. All servers are stored within a dedicated, temperature-controlled server room in the BioInformatics Service Center suite of offices at the EverGreen Center in Lebanon, New Hampshire. BioInformatics has a local area network using hardware based Cisco PIX firewalls for network security and access control. Access to the World Wide Web is through the Dartmouth College network. The Service Center includes 1.5 FTE dedicated to network, hardware and software infrastructure management.
Staff office space for the BSC is located in the EverGreen Center, Suite 301, Lebanon, New Hampshire. A temperature-controlled, access-controlled machine room is located within the office suite, behind the EverGreen Center outer building access-controlled entrance, and the Suite 301 is access-controlled.
The Cancer Prevention and Control’s Database Management Operations (CPC-DMO) group at Arizona Cancer Center (ACC) will provide the database management for Projects II and III of this Program. The Arizona Cancer Center IT team and the University Information Technology Services (UITS) at University of Arizona centrally provide the information technology support (hardware and networking environment) for CPC-DMO. Databases and web applications built by CPC-DMO reside in the inner layer of the secured network serviced by the network experts from UITS and three dedicated IT professionals in ACC. CPC-DMO has complete control over two web/databases servers, two file servers and thirteen personal computer systems and has access to ACC’s shared IT services. The servers run Windows Server 2003 R2 and use RAID protected disk storage systems. Servers are attached to UPS battery backups for power protection. Backup consists of a system incorporating disk-to-disk backup and recovery. The servers are located in a temperature-controlled and access-controlled machine room located in each the three office locations in Tucson Arizona: Leon Levy Cancer Center and two satellite locations (Copper Building and Fort Lowell Building). Study staff office space for CPC-DMO is located in the same buildings as the servers.
Core B of this Program will facilitate communication among Cores and Projects. The Core will be responsible for creating a Program website through which all investigator and staff contact information, teleconference and meeting information, meeting minutes, protocols and documentation, data collection, and reporting my be accessed by authorized Program personnel. The Core website will also store and report metadata (data which describes other data is regularly called metadata) for all studies, enabling authorized study staff to understand the status and location of data and study documentation for all Program Projects. Kristen Anton, Core B Director and Sub-core B1 Leader, will meet weekly with Fang Wang, Sub-core B2 Leader, to ensure that systems and data specifications are being developed within Program standards and the informatics and data management needs of all Projects are being met. Ms. Anton and Ms. Wang will collaborate to establish data and software standards for the Program, aligning the Program standards with published NCI standards, as well as to solve any technical issues that arise in the course of the Program. Ms. Anton will also participate in regular teleconferences with each of the other Cores and Projects, and in meetings and teleconferences with Project I. Ms. Fang will additionally participate in regular meetings and teleconferences with Projects II and III.
The Program will take advantage of the collaborations already in place between the BioInformatics Service Center and NIH, NCICB and NCI networks like Cancer Biomedical Informatics Grid (caBIG), the Early Detection Research Network (EDRN) and the Breast and Colon Cancer Family Registries (BC-CFR). Ms. Anton’s established collaborations with the Informatics components of these networks will allow this Program and Core to both utilize policies, procedures and standards from, and contribute to, these global efforts, aligning this Program with national standards and data sharing initiatives. The Program will also benefit tremendously from the database management and study management experience the CPC-DMO group has gained over the past 20 years via three Colon Cancer Prevention Program Projects, each involving a phase III clinical trial, and the collaborations with SWOG on the Colorectal SELECT Ancillary Study, MD Anderson Cancer Center on Bi-national Breast Cancer Study, and Dartmouth group and Oregon Health and Science University on the previous Polyp Pooling Project.