- What is data mining? In your answer, address the following:
(a) Is it another hype?
(b) Is it a simple transformation of technology developed from databases, statistics, and machine learning?
(c) Explain how the evolution of database technology led to data mining.
(d) Describe the steps involved in data mining when viewed as a process of knowledge discovery.
- a) Present an example where data mining is crucial to the success of a business. What data mining functions does this business need? Can they be performed alternatively by data query processing or simple statistical analysis?
b) Suppose your task as a software engineer at Big-University is to design a data mining system to examine their university course database, which contains the following information: the name, address, and status (e.g., undergraduate or graduate) of each student, the courses taken, and their cumulative grade point average (GPA). Describe the architecture you would choose. What is the purpose of each component of this architecture?
c) How is a data warehouse different from a database? How are they similar?
d) Briefly describe the following advanced database systems and applications: object-relational databases, spatial databases, text databases, multimedia databases, the World Wide Web.
e) Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, prediction, clustering, and evolution analysis. Give examples of each data mining functionality, using a real-life database that you are familiar with.
f) What is the difference between discrimination and classification? Between characterization and clustering? Between classification and prediction? For each of these pairs of tasks, how are they similar?
- a) List and describe the five primitives for specifying a data mining task.
b) Describe why concept hierarchies are useful in data mining.
c) Outliers are often discarded as noise. However, one person’s garbage could be another’s treasure. For example, exceptions in credit card transactions can help us detect the fraudulent use of credit cards. Taking fraudulence detection as an example, propose two methods that can be used to detect outliers and discuss which one is more reliable.
- Recent applications pay special attention to spatiotemporal data streams. A spatiotemporal data stream contains spatial information that changes over time, and is in the form of stream data, i.e., the data flow in-and-out like possibly infinite streams.
(a) Present three application examples of spatiotemporal data streams.
(b) Discuss what kind of interesting knowledge can be mined from such data streams, with limited time and resources.
(c) Identify and discuss the major challenges in spatiotemporal data mining.
(d) Using one application example, sketch a method to mine one kind of knowledge from such stream data efficiently.
- a) Describe the differences between the following approaches for the integration of a data mining system with a database or data warehouse system: no coupling, loose coupling, semi-tight coupling, and tight coupling. State which approach you think is the most popular, and why.
b) Regarding the coupling of a data mining system with a database and/or data warehouse system, what are the differences between no coupling, loose coupling, semi-tight coupling, and tight coupling?
- Suppose that your local bank has a data mining system. The bank has been studying your debit card usage patterns. Noticing that you make many transactions at home renovation stores, the bank decides to contact you, offering information regarding their special loans for home improvements.
a) Discuss how this may contact with your right to privacy.
b) Describe another situation where you feel that data mining can infringe on your privacy.
c) Describe a privacy-preserving data mining method that may allow the bank to perform customer pattern analysis without infringing on customers’ right to privacy.
d) What are some examples where data mining could be used to help society? Can you think of ways it could be used that may be detrimental to society?
- a) What are the major challenges faced in bringing data mining research to market? Illustrate one data mining research issue that, in your view, may have a strong impact on the market and on society. Discuss how to approach such a research issue.
b) Based on your view, what is the most challenging research problem in data mining? If you were given a number of years of time and a good number of researchers and implementers, can you work out a plan so that progress can be made toward a solution to such a problem? How?
General-purpose computers and domain-independent relational database systems have become a large market in the last several decades. However, many people feel that generic data mining systems will not prevail in the data mining market. What do you think? For data mining, should we focus our efforts on developing domain-independent data mining tools or on developing domain-specific data mining solutions? Present your reasoning.
- a) What are the major challenges of mining a huge amount of data (such as billions of tuples) in comparison with mining a small amount of data (such as a few hundred-tuple data set)?
b) Outline the major research challenges of data mining in one specific application domain, such as stream/sensor data analysis, spatiotemporal data analysis, or bioinformatics.