Mining Pathological Data To Support Medical Diagnostics

Please download to get full document.

View again

of 4
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

History

Published:

Views: 83 | Pages: 4

Extension: PDF | Download: 0

Share
Description
Mining Pathological Data To Support Medical Diagnostics
Tags
Transcript
  Mining Pathological Data To Support Medical Diagnostics Abu Sayed Md. Latiful Hoque Department of CSE, BUET Dhaka, Bangladesh asmlatifulhoque@cse.buet.ac.bd Sharif Md. Saad Galib Department of CSE, BUET Dhaka, Bangladesh galib2145@gmail.com Mashrura Tasnim Department of CSE, BUET Dhaka, Bangladesh  mashrura_cse@yahoo.com ABSTRACT  Predictive data mining is becoming an essential instrument for medical researchers. It offers methodological and technical solutions to deal with the analysis of medical data. Construction of prediction models on the basis of careful analysis is also a part of this process. However, understanding the main issues underlying these methods and the application of agreed and standardized procedures is inevitable to obtain satisfactory results. In our work we briefly discussed the present structure of  pathological data, the requirements to formulate efficient models for predicative data mining and the necessity to reform the present structure to cope-up with the requirements of predictive data mining models. We also present our database pre-processing method for clinical data mining. General Terms  Algorithms; Management; Design; Reliability. Keywords  Clinical data mining, Data preprocessing, Assertion algorithm. 1.   INTRODUCTION Data mining is the process of selecting, exploring and modeling large amount of data in order to discover unknown patterns or relationships and thus to provide a clear and useful result to the data analyst [1]. Techniques of solving data mining problems include multi-dimensional databases, machine learning, soft computing and data visualization, statistics, hypothesis testing, clustering, classification, regression techniques etc. The success of data mining equally lies in the appropriate choice and combination of techniques and availability of well-structured data. In recent years, data mining has been widely used in the medical research. In practice, applications in clinical medicines that need to perform predictive modeling by analyzing the knowledge available in the clinical domain may be benefitted from specific data mining approaches. Predictive data mining methods may be applied to construct decision models for procedures such as  prognosis, diagnosis and treatment planning. A significant issue lies behind the development of predictive data mining model. The  process of prediction requires the data to include some special response variable which may be categorical or numerical. Here we encounter a number of issues that include handling of missing data and noise, unseen cases, the variation in presentation of classification models, explaining the decisions reached when models are used in decision-making etc. In this paper we limit our discussion in data preprocessing method, applicable to develop predictive data mining model using the existing database of pathological data. The existing database structure and the hazards in handling them are discussed in section III. In section IV we present our data preprocessing model. Our findings in this research along with our future plans regarding this model are discussed in section V. In section VI we have drawn a conclusion. 2.   RELATED WORKS The uniqueness of clinical data mining is identified in the work of Cioset. al. [2]. They pointed out heterogeneity of medical data; ethical, legal and social issues; statistical philosophy, and special status of medicine as the major points of uniqueness of medical data. Analyzing these issues they posed several questions that must be answered by the scientific community, so that both the  patients on whom the data are collected, as well as the data miners, can be benefitted. Prather et. al. [3] described the processes involved in mining a clinical database including data warehousing, data query and cleaning, and data analysis. In their paper they illustrated how medical production systems can be warehoused and mined for knowledge discovery. Their work include elaborate description of a medical data mining process which entails transfer of the database from a comprehensive computer-based patient record system(CPRS) into a data warehouse server, creation of a dataset for analysis by extracting and cleaning selected variables, and mining of the data using exploratory factor analysis. Critical assessment of data mining algorithms that are widely used  by pharmacovigilance was presented by Hauben et. al. [4]. In their  paper they emphasized a number of issues, needed to be considered by the data miners, while deploying the algorithms in real-life pharmacovigilance. Bellazziet. al. [5] gave a methodological review of data mining, focusing on its data analysis process and highlighting some of the most relevant issues related to its application in clinical medicine. On the ground of elaborate analysis and experimental results they  provided a general task descriptions and a simple set of guidelines that may be applied during the construction of clinical predictive models using data mining techniques. From the above discussion it is quite eminent that, predictive data mining techniques are discussed and used in real-life decision making in various fields of medical science. However, we have noticed a lack of interest in using this sort of data mining Permission to make digital or hard copies of all or part of this work for  personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. .  techniques in pathological data-mining. In our work we will emphasize on this issue. 3.   EXISTING DATABASE STRUCTURE In this section we will describe the classical database models, maintained to preserve pathological data. Figure 1 represents a conceptual entity relationship diagram of the pathological database that holds the pathological test records of all the patients of several clinics. There exists a many-to-many relationship  between entities clinic and patient, as one patient can visit multiple clinics, in contrast one clinic can be visited by multiple  patients. The reference entity represents the set of tests the patient is referred to by a doctor during a visit of that patient to that clinic. Both the relationships between patient and reference and doctor and reference are one to many, as one patient can be referred multiple times and one doctor can provide multiple references. There is a many to many relationship between the test entity and the reference entity. It is because one reference contains multiple tests while one test may belong to multiple references. This sort of entity relationship is quite common in existing clinical database system, which needs to undergo careful preprocessing in order to be used in data mining. In the next section we will discuss this issue. 4.   DATA PREPROCESSING 4.1   Combined Data-table In Figure 2 and 3 we have provided a schema that allows us to convert the pathological data obtained from the database structure discussed in the previous section to a format that is suitable for running our association rule mining algorithm (Figure 4 and 5 represent tabular form of the same). The schema contains a flat table named “test record” having all the pathological tests as its attribute. The patient table is the table storing all the patient data. There is a one-to-many relationship between the patient and the test record table. Each record of patient-test table represents the corresponding patient’ s tests taken and their result during one  particular visit to a clinic. So in each record most of the attributes remain NULL. This arrangement makes the pathological test data obtained in the above mentioned manner suitable for running our mining algorithm. 4.2   Dynamic Data-generation Methodology Currently we do not have access to real pathological test data. That’ s why we are using generated test-data to test the effectiveness of our mining algorithm and also to find interesting and useful association among abnormal test results. While generating test-data we have applied constraints in our data-generation algorithm so the data, though generated, is not irrelevant and is quite correspondent with real world data. The test-table and the group-table help us to apply those constraints. The test-table includes record of meta-data for each test. The test- result can either be boolean or floating point, represented by data-type field. The min and max field signifies the range of normal Figure 1. ERD diagram of pathological database Figure 2. Test-data relationship diagram Figure 3. Flat table for running mining algorithm Figure 4. Tabular form of test-data relationship  result of the test, provided the corresponding test’s data -type is floating point. The unit field poses restriction on unit of measurement for the corresponding test-result. To ensure that the choice of test is relevant in particular visit of that patient, we have associated each test with a system e.g. kidney, cardiovascular etc. and assigned each relationship a probability to be referred. To explain this, let us consider a test has a relation with system “Cardiovascular” with 80% probability. It means, if a patient comes up with symptoms of diseases associated with cardiovascular system, there is 80% probability that the test will  be recommended by the doctor. This allows us to claim our choices of tests to be realistic while generating test data. The reason behind the many-to-many relationship is one test can  belong to multiple systems. For example, to whatever system the  patient’ s complains belong, blood-test will always be there. Figure 6 represents a sample of this probability assignment. . 5.   RESULT AND DISCUSSION We have used the data-generation algorithm to generate test-data to test accuracy and effectiveness of our mining algorithm. We will develop our user interface to specify the input parameters of the algorithm. To explain further, this interface will enable us to specify the physiological systems for which the pathological test data are generated e.g. if we select cardiovascular and kidney test data will be generated for these systems randomly. We shall also  be able to specify the minimum probability for a test to be chosen randomly in the test-data generation. As we need to discover association rules among the abnormal test results, we will be able to specify a percentage of „x‟ telling the algorithm to randomly choose x% of the test data to provide a result outside the normal range mentioned in the “test table” (Figure 2). This will enable us to test the effectiveness of our algorithm in a flexible way by changing these parameters. Finally after testing the mining algorithm, we will apply the algorithm to mine for association rule in real-world pathological data. Pseudo code for our synthetic data generation algorithm is presented in Algorithm 1. Figure 6. Test data relationship Figure 7. Test table Figure 5. Tabular form of flat table of patients' record   6.   CONCLUSION AND FUTURE WORKS In our work we tried to find out the limitations of existing database systems that restrict us to implement advanced data mining technology. In order to obtain association rule among abnormal test results it is inevitable to restructure the existing database. In this work we presented our data preprocessing methodology for running our data mining algorithm on  pathological data. We have classified the attributes of existing  pathological database according to physiological system and disease symptoms. Applying the data generation methodology and following the suitable classification, we have generated synthetic data. While generating this synthetic data, we posed specific constraints so that the generated data keeps harmony with real world data. With the generated synthetic data in hand, we will apply our mining algorithm to find out association rules in them. Sequentially we will collect real data to justify our finding. Achieving sufficient justification we will apply these association rules in real world pathological system. Our future work will include elaborate description of test results obtained by running this algorithm on processed pathological data. 7.   ACKNOWLEDGMENTS This research is being conducted by the students and Faculty members of the database and data mining research group at the Department of Computer Scienceand Engineering, Bangladesh University of Engineering and Technology (BUET). REFERENCES [1]   P. Giudici, Applied Data Mining Statistical Methods for Business and Industry, Wiley & Sons, 2003. [2]   Cios, Krzysztof J., and G. William Moore. "Uniqueness of medical data mining." Artificial intelligence in medicine 26.1 (2002): 1-24. [3]   Prather, Jonathan C., et al. "Medical data mining: knowledge discovery in a clinical data warehouse." Proceedings of the AMIA Annual Fall Symposium. American Medical Informatics Association, 1997. [4]   Hauben, Manfred, et al. "The role of data mining in  pharmacovigilance." (2005): 929-948. [5]   Bellazzi, Riccardo, and Blaz Zupan. "Predictive data mining in clinical medicine: current issues and guidelines." international journal of medical informatics 77.2 (2008): 81-97. Algorithm 1: Data generation algorithm Input: P=Table of patients, S=Table of systems}, T=Table of tests, C=Table of constraints, R=Relationship between test and systems, N= number of rows to be generated Output: F=Flat table of patients’ test record    1.   for each row 1  to N    2.   pick a random patient p from P    3.   Insert a new row r for patient p 4.   Choose random number r1 of relevant systems 5.   for i=  1  to r1 do 6.   Pick a relevant system s from table S 7.   Choose a random number r2 of tests under system s 8.   for i=  1 to r2 do 9.   Pick a test t from table T according to conditions specified in R of system and test t 10.   Assign a value v to test t satisfying all the constraints of this test in table C 11.   Update row r of table F
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x