Data Mining and Warehousing
Section outline
-
Dear Learners,
Wel-Come to the Course on Data Mining & Warehousing.It has Following Course Objectives
& Course Outcomes:
Prerequisites: Database Management SystemCourse Objectives:
- To understand the fundamentals of Data Mining.
- To identify the appropriateness and need of mining the data.
- To learn the pre-processing, mining and post processing of the data.
- To understand various Distant Measures techniques in data mining.
- To understand clustering techniques and algorithms in data mining.
- To understand classification techniques and algorithms in data mining
Course Outcomes (COs): At the end of this course, students will be able to,
CO No.
Title
Bloom’s Taxonomy
Level
Descriptor
CO1
Understand basic, intermediate and advanced techniques to mine the data.
2
Understand
CO2
Apply the pre-processing techniques on data
3
Apply
CO3
Ability to explore the data warehouse and its design.
4
Analyze
CO4
Examine the hidden patterns in the data
4
Analyze
CO5
Apply the mining process by frequent pattern analysis techniques.
3
Apply
CO6
Demonstrate the Classification techniques for realistic data.
3
Apply
Text Books:
Sr. No.
Authors
Title
Edition
Year
Publication
1
Han, Jiawei Kamber, Micheline Pei and Jian
“Data Mining: Concepts and Techniques”,
Elsevier Publishers
2
Mohammed J. Zaki, Wagner Meira Jr.
“Data Mining and Analysis”
Cambridge University Press,
References Books:
Sr. No.
Authors
Title
Edition
Year
Publication
1
Vipin Kumar,
“Introduction to Data Mining”,
Pearson
2
Ikhvinder Singh,
“Data Mining & Warehousing”,
Khanna Publishing House
3
Charu C. Aggarwal
“Data Mining: The Textbook”
Springer
4
Ian H. Witten, Eibe Frank,
“Data Mining: Practical Machine Learning Tool and Techniques”
Elsevier Publishers
5
Luís Torgo,
“Data Mining with R, Learning with Case Studies”
CRC Press, Talay and Francis Group
6
Carlo Vercellis,
“Business Intelligence - Data Mining and Optimization for Decision Making”
Wiley Publications
E-Resources:
Sr. No.
Link
1
2
Thanks & Regards,
Dr. T.Bhaskar
Associate Professor(Computer Engg),Sanjivani College of Engineering,Kopragon
Google-Site: https://sites.google.com/view/bhaskart/ug-notes/datamining-warehousing
Moodle-Site: https://proftbhaskar.gnomio.com/course/view.php?id=3 (Log in as Guest)
DMW YouTube Playlist: https://tinyurl.com/DMW-Bhaskar -
Syllabus: Data Mining, Kinds of pattern and technologies, Data Mining Task Primitives, issues in mining, KDD vs data mining, OLAP, knowledge representation, data pre-processing - cleaning,
integration, reduction, transformation and discretization, Data: Data, Information and Knowledge; Attribute Types: Nominal, Binary, Ordinal and Numeric attributes, Discrete versus
Continuous Attributes. -
Syllabus: Introduction to Data Pre-processing, Data Cleaning: Missing values, Noisy data; Data integration: Correlation analysis; transformation: Min-max normalization, z-score normalization
and decimal scaling; data reduction: Data Cube Aggregation, Attribute Subset Selection, sampling; and Data Discretization: Binning, Histogram Analysis. -
Syllabus:Introduction to Data Pre-processing, Data Cleaning: Missing values, Noisy data; Data integration: Correlation analysis; transformation: Min-max normalization, z-score normalization
and decimal scaling; data reduction: Data Cube Aggregation, Attribute Subset Selection, sampling; and Data Discretization: Binning, Histogram Analysis -
Syllabus: Measuring Data Similarity and Dissimilarity, Proximity Measures for Nominal Attributes and Binary Attributes, interval scaled; Dissimilarity of Numeric Data: Minskowski Distance Euclidean distance and Manhattan distance Proximity Measures for Categorical, Ordinal Attributes, Ratio scaled variables; Dissimilarity for Attributes of Mixed Types, Cosine Similarity, partitioning methods- k-means, k-medoids
-
Syllabus:Market Basket Analysis, Frequent item set, closed item set & Association Rules, mining multilevel association rules, constraint based association rule mining, Generating Association
Rules from Frequent Item sets, Apriori Algorithm, Improving the Efficiency of Apriori, FP Growth Algorithm. Mining Various Kinds of Association Rules: Mining multilevel association rules, constraint based association rule mining, Meta rule-Guided Mining of Association Rules. -
Syllabus:Introduction, classification requirements, methods of supervised learning, decision trees- attribute selection, tree pruning, ID3, scalable decision tree techniques, rule extraction from decision tree, Regression, Bayesian Belief Networks, Training Bayesian Belief Networks, Classification Using Frequent Patterns, Associative Classification, Lazy Learners-k-Nearest-Neighbour Classifiers, Case-Based Reasoning, Multiclass Classification, Metrics for Evaluating Classifier Evaluating the Accuracy of a Classifier.
-
Rubrics for Assessment of Data Mining & Warehousing Lab
Evaluation of practical assignment is based on the following criteria’s. Each Assignment is evaluated out of 10 Marks
Criteria
Excellent
Good
Average
Poor
Write Ups (2)
Timely submission within deadline in all respects.
(2)
Timely submission but needs some improvement. (1)
Submission with maximum one-week delay. (1)
Delayed in submission or found copied. (0)
Understanding (4)
Understand all the concepts, algorithm, or logic. (4)
Understand the concepts, algorithm, or logic but need improvement. (3)
Limited understanding of the concepts or algorithm or logic but need more improvement (2)
Failed to understand the concepts, algorithm, or logic. (0)
Performance (4)
Implemented the concepts, algorithm, or logic with correct expected output considering test cases. ( 4)
Implemented the concepts, algorithm, or logic with expected results. (3)
Implemented the concepts, algorithm, or logic with partial results and needs improvement. (2-1)
Not implemented and no output. (0)
-
Consider a suitable dataset. For clustering of data instances in different groups, apply different clustering techniques (minimum 2). Visualize the clusters using suitable tool.
-
Apply a-priori algorithm to find frequently occurring items from given data and generate strong association rules using support and confidence thresholds.For Example: Market Basket Analysis
-
Consider a suitable text dataset. Remove stop words, apply stemming and feature selection techniques to represent documents as vectors.Classify documents and evaluate precision, recall.
-
For an organization of your choice, choose a set of business processes. Design star / snow flake schemas for analyzing these processes. Create a fact constellation schema by combining them. Extract data from different data sources, apply suitable transformations and load into destination tables using an ETL tool. For Example: Business Origination: Sales, Order, Marketing Process.