Information Security and Computer Fraud
ISSN (Print): 2376-9602 ISSN (Online): 2376-9629 Website: http://www.sciepub.com/journal/iscf Editor-in-chief: Sergii Kavun
Open Access
Journal Browser
Go
Information Security and Computer Fraud. 2014, 2(3), 33-38
DOI: 10.12691/iscf-2-3-1
Open AccessArticle

The DISTANCE Model for Collaborative Research: Distributing Analytic Effort Using Scrambled Data Sets

Howard H. Moffet1, , E. Margaret Warton1, Melissa M. Parker1, Jennifer Y. Liu1, Courtney R. Lyles2 and Andrew J. Karter1

1Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA 94612

2Center for Vulnerable Populations, University of California San Francisco, 1001 Potrero Ave., San Francisco, CA 94110

Pub. Date: October 20, 2014

Cite this paper:
Howard H. Moffet, E. Margaret Warton, Melissa M. Parker, Jennifer Y. Liu, Courtney R. Lyles and Andrew J. Karter. The DISTANCE Model for Collaborative Research: Distributing Analytic Effort Using Scrambled Data Sets. Information Security and Computer Fraud. 2014; 2(3):33-38. doi: 10.12691/iscf-2-3-1

Abstract

Background: Data-sharing is encouraged to fulfill the ethical responsibility to transform research data into public health knowledge, but data sharing carries risks of improper disclosure and potential harm from release of individually identifiable data. Methods: The study objective was to develop and implement a novel method for scientific collaboration and data sharing which distributes the analytic burden while protecting patient privacy. A procedure was developed where in an investigator who is external to an analytic coordinating center (ACC) can conduct original research following a protocol governed by a Publications and Presentations (P&P) Committee. The collaborating investigator submits a study proposal and, if approved, develops the analytic specifications using existing data dictionaries and templates. An original data set is prepared according to the specifications and the external investigator is provided with a complete but de-identified and shuffled data set which retains all key data fields but which obfuscates individually identifiable data and patterns; this “scrambled data set” provides a “sandbox” for the external investigator to develop and test analytic code for analyses. The analytic code is then run against the original data at the ACC to generate output which is used by the external investigator in preparing a manuscript for journal submission. Results: The method has been successfully used with collaborators to produce many published papers and conference reports. Conclusion: By distributing the analytic burden, this method can facilitate collaboration and expand analytic capacity, resulting in more science for less money.

Keywords:
data sharing privacy rule information dissemination collaboration cohort studies epidemiology de-identification

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  NIH Statement on Sharing Research Data. http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm accessed March 21, 2013.
 
[2]  FINAL NIH STATEMENT ON SHARING RESEARCH DATA, NOTICE: NOT-OD-03-032. 2003. http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html accessed March 21, 2013.
 
[3]  U.S. Department of Health and Human Services: Standards for Privacy of Individually Identifiable Health Information. 45 C.F.R. Parts 160 and 164.
 
[4]  Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html Accessed Mar 21, 2013.
 
[5]  Miller JD: Sharing clinical research data in the United States under the Health Insurance Portability and Accountability Act and the Privacy Rule. Trials 2010, 11:112.
 
[6]  Moffet HH, Adler N, Schillinger D, Ahmed AT, Laraia B, Selby JV, Neugebauer R, Liu JY, Parker MM, Warton M et al: Cohort Profile: The Diabetes Study of Northern California (DISTANCE)--objectives and design of a survey follow-up study of social health disparities in a managed care population. Int J Epidemiol 2009, 38 (1): 38-47.
 
[7]  Karter AJ, Schillinger D, Adams AS, Moffet HH, Liu J, Adler NE, Kanaya AM: Elevated Rates of Diabetes in Pacific Islanders and Asian Subgroups: The Diabetes Study of Northern California (DISTANCE). Diabetes Care 2012, 36 (3): 574-579.
 
[8]  Greenland S, Pearl J, Robins JM: Causal diagrams for epidemiologic research. Epidemiology 1999, 10 (1): 37-48.
 
[9]  Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA: Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. American journal of epidemiology 2002, 155 (2): 176-184.
 
[10]  Flegal KM, Ezzati TM, Harris MI, Haynes SG, Juarez RZ, Knowler WC, Perez-Stable EJ, Stern MP: Prevalence of diabetes in Mexican Americans, Cubans, and Puerto Ricans from the Hispanic Health and Nutrition Examination Survey, 1982-1984. Diabetes Care 1991, 14 (7): 628-638.
 
[11]  Muralidhar K SR: Data Shuffling—A New Masking Approach for Numerical Data. Management Science 2006, 52 (5): 658-670.
 
[12]  Laiteerapong N, Karter AJ, Liu JY, Moffet HH, Sudore R, Schillinger D, John PM, Huang ES: Correlates of quality of life in older adults with diabetes: the diabetes & aging study. Diabetes Care 2011, 34 (8): 1749-1753.
 
[13]  Lyles CR, Karter AJ, Young BA, Spigner C, Grembowski D, Schillinger D, Adler N: Patient-reported racial/ethnic healthcare provider discrimination and medication intensification in the Diabetes Study of Northern California (DISTANCE). J Gen Intern Med 2011, 26 (10): 1138-1144.
 
[14]  Lyles CR, Karter AJ, Young BA, Spigner C, Grembowski D, Schillinger D, Adler N: Provider factors and patient-reported healthcare discrimination in the Diabetes Study of California (DISTANCE). Patient Educ Couns 2011, 85 (3):e216-224.
 
[15]  Lyles CR, Karter AJ, Young BA, Spigner C, Grembowski D, Schillinger D, Adler NE: Correlates of patient-reported racial/ethnic health care discrimination in the Diabetes Study of Northern California (DISTANCE). J Health Care Poor Underserved 2011, 22 (1): 211-225.
 
[16]  Stoddard PJ, Laraia BA, Warton EM, Moffet HH, Adler NE, Schillinger D, Karter AJ: Neighborhood Deprivation and Change in BMI Among Adults With Type 2 Diabetes: The Diabetes Study of Northern California (DISTANCE). Diabetes Care 2012, 36 (5): 1200-1208.
 
[17]  Sudore RL, Karter AJ, Huang ES, Moffet HH, Laiteerapong N, Schenker Y, Adams A, Whitmer RA, Liu JY, Miao Y et al: Symptom Burden of Adults with Type 2 Diabetes Across the Disease Course: Diabetes & Aging Study. Journal of General Internal Medicine 2012, 27 (12): 1674-1681.
 
[18]  Moskowitz D, Lyles CR, Karter AJ, Adler N, Moffet HH, Schillinger D: Patient reported interpersonal processes of care and perceived social position: The Diabetes Study of Northern California (DISTANCE). Patient Educ Couns 2013, 90 (3): 392-398.
 
[19]  Lee SJ, Karter AJ, Thai JN, Van Den Eeden SK, Huang ES: Glycemic Control and Urinary Incontinence in Women with Diabetes Mellitus. J Womens Health (Larchmt) 2013, 22 (12): 1049-1055.
 
[20]  Jones-Smith JC, Karter AJ, Warton EM, Kelly M, Kersten E, Moffet HH, Adler N, Schillinger D, Laraia BA: Obesity and the food environment: income and ethnicity differences among people with diabetes: the Diabetes Study of Northern California (DISTANCE). Diabetes Care 2013, 36 (9): 2697-2705.
 
[21]  Rees CA KA, Young BA, Spigner C, Grembowski D, Schillinger D, Adler N Correlates of Self-Reported Discrimination in the Diabetes Study of Northern California (DISTANCE). In 31st Annual Meeting & Scientific Sessions of the Society of Behavioral Medicine. Seattle, WA.
 
[22]  Moskowitz D RC, Adler N, Karter AJ, Moffet HH, Schillinger D. : Effect of the social hierarchy on patient-physician communication: results from the DISTANCE study. In UCSF Disparities Symposium. San Francisco, CA.
 
[23]  Lee SJ K, Van den Eeden SK, Cenzer IS, Liu JY, Moffet HH, Huang ES: Glycemic Control and Incontinence in Older Women. In Amer Diabetes Assn meeting. San Diego, CA.
 
[24]  Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG: Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. Trials 2010, 11: 9.
 
[25]  Loukides G, Denny JC, Malin B: The disclosure of diagnosis codes can breach research participants' privacy. J Am Med Inform Assoc 2010, 17 (3): 322-327.