Task 1 (10 marks) – Exploration Find the final K-means configurations for a series of datasets and various initial generations of centroids. How many different configurations did you observe? How frequently

Posted: May 18, 2025/Under: Uncategorized/By: admin

Subject Code: MA3022 – MA4022 – MA7022
MA3022 / MA4022 / MA7022 Data Mining and Neural Networks

Due till 03.02.2025
100 marks available

Theoretical Background and Two Mini-Research Projects

Theoretical Background (20 marks)

Give a description of classification and clustering (5 marks).
What is the difference between them? (5 marks)
Describe KNN approach and Hart’s algorithm for data (5 marks).
Describe the K-means (5 marks).

Project 1: Condensed Nearest Neighbour for Data Reduction in Nearest Neighbour Classifier (40 marks)

Go to web page:
https://github.com/Mirkes/Data_Mining_Softbook/wiki/KNN-and-potential-ene

Read text. Download application:
https://github.com/Mirkes/Data_Mining_Softbook/blob/master/knn/knn.jar

Task 1 (10 marks)

Study how the number of prototypes depends on the number of points for two convex well-separated classes.

Task 2 (10 marks)

Prepare a series of examples with more sophisticated non-convex shapes of well-separated classes. Study how the number of prototypes depends on the number of points in these classes.

Task 3 (10 marks)

Study how the number of prototypes and outliers depends on the number of points for two well-separated classes with added background uniformly distributed noise (option: random).

Task 4 (10 marks)

In conclusion, discuss the results and propose a hypothesis for further study.

Do not forget to save and submit the configurations of the classes and prototypes as figures!

Project 2: Dynamics of K-means Clustering (40 marks)

Go to web page:
https://github.com/Mirkes/Data_Mining_Softbook/wiki/k-means-and-k-medoids

Read text. Download application:
https://github.com/Mirkes/Data_Mining_Softbook/blob/master/kmeans/KMeansKMedoids.jar

Task 1 (10 marks) – Exploration

Find the final K-means configurations for a series of datasets and various initial generations of centroids. How many different configurations did you observe? How frequently did they appear? How many iterations were required?