Course: Data Mining and Warehousing | Dr.T.Bhaskar Learning Portal

Section outline

Announcements

Dear Learners,
Wel-Come to the Course on Data Mining & Warehousing.It has Following Course Objectives
& Course Outcomes:
Prerequisites: Database Management System

Course Objectives:

To understand the fundamentals of Data Mining.
To identify the appropriateness and need of mining the data.
To learn the pre-processing, mining and post processing of the data.
To understand various Distant Measures techniques in data mining.
To understand clustering techniques and algorithms in data mining.
To understand classification techniques and algorithms in data mining

Course Outcomes (COs): At the end of this course, students will be able to,

CO No.	Title	Bloom’s Taxonomy
CO No.	Title	Level	Descriptor
CO1	Understand basic, intermediate and advanced techniques to mine the data.	2	Understand
CO2	Apply the pre-processing techniques on data	3	Apply
CO3	Ability to explore the data warehouse and its design.	4	Analyze
CO4	Examine the hidden patterns in the data	4	Analyze
CO5	Apply the mining process by frequent pattern analysis techniques.	3	Apply
CO6	Demonstrate the Classification techniques for realistic data.	3	Apply

Text Books:

Sr. No.	Authors	Title	Edition	Year	Publication
1	Han, Jiawei Kamber, Micheline Pei and Jian	“Data Mining: Concepts and Techniques”,			Elsevier Publishers
2	Mohammed J. Zaki, Wagner Meira Jr.	“Data Mining and Analysis”			Cambridge University Press,

References Books:

Sr. No.	Authors	Title	Edition	Year	Publication
1	Vipin Kumar,	“Introduction to Data Mining”,			Pearson
2	Ikhvinder Singh,	“Data Mining & Warehousing”,			Khanna Publishing House
3	Charu C. Aggarwal	“Data Mining: The Textbook”			Springer
4	Ian H. Witten, Eibe Frank,	“Data Mining: Practical Machine Learning Tool and Techniques”			Elsevier Publishers
5	Luís Torgo,	“Data Mining with R, Learning with Case Studies”			CRC Press, Talay and Francis Group
6	Carlo Vercellis,	“Business Intelligence - Data Mining and Optimization for Decision Making”			Wiley Publications

E-Resources:

Sr. No.	Link
1	https://nptel.ac.in/courses/106105174
2	https://ocw.mit.edu/courses/15-062-data-mining-spring-2003/

Thanks & Regards,
Dr. T.Bhaskar
Associate Professor(Computer Engg),Sanjivani College of Engineering,Kopragon
Google-Site: https://sites.google.com/view/bhaskart/ug-notes/datamining-warehousing
Moodle-Site: https://proftbhaskar.gnomio.com/course/view.php?id=3 (Log in as Guest)
DMW YouTube Playlist: https://tinyurl.com/DMW-Bhaskar

Select activity Announcements

Announcements Forum

Select section UNIT-I: Introduction to Data Mining

Collapse Expand
UNIT-I: Introduction to Data Mining
Syllabus: Data Mining, Kinds of pattern and technologies, Data Mining Task Primitives, issues in mining, KDD vs data mining, OLAP, knowledge representation, data pre-processing - cleaning,
integration, reduction, transformation and discretization, Data: Data, Information and Knowledge; Attribute Types: Nominal, Binary, Ordinal and Numeric attributes, Discrete versus
Continuous Attributes.
- Select activity DMW UNIT-I Notes
  
  DMW UNIT-I Notes URL
  
  This link has ppt of UNIT-I:Basics of Data Mining.
  
  DMW Videos Playlist: https://tinyurl.com/DMW-Bhaskar
Select section UNIT-II: Data Pre-processing

Collapse Expand
UNIT-II: Data Pre-processing
Syllabus: Introduction to Data Pre-processing, Data Cleaning: Missing values, Noisy data; Data integration: Correlation analysis; transformation: Min-max normalization, z-score normalization
and decimal scaling; data reduction: Data Cube Aggregation, Attribute Subset Selection, sampling; and Data Discretization: Binning, Histogram Analysis.
- Select activity DMW UNIT-II Notes
  
  DMW UNIT-II Notes URL
  
  This link has ppt of UNIT-II:Data Preprocessing.
  
  DMW Videos Playlist: https://tinyurl.com/DMW-Bhaskar
Select section UNIT-III: Data Warehouse

Collapse Expand
UNIT-III: Data Warehouse
Syllabus:Introduction to Data Pre-processing, Data Cleaning: Missing values, Noisy data; Data integration: Correlation analysis; transformation: Min-max normalization, z-score normalization
and decimal scaling; data reduction: Data Cube Aggregation, Attribute Subset Selection, sampling; and Data Discretization: Binning, Histogram Analysis
- Select activity DMW UNIT-III Notes
  
  DMW UNIT-III Notes URL
  
  This link has ppt of UNIT-III:Data Warehouse.
  
  DMW Videos Playlist: https://tinyurl.com/DMW-Bhaskar
Select section UNIT-IV: Cluster Analysis: Measuring Similarity & Dissimilarity

Collapse Expand
UNIT-IV: Cluster Analysis: Measuring Similarity & Dissimilarity
Syllabus: Measuring Data Similarity and Dissimilarity, Proximity Measures for Nominal Attributes and Binary Attributes, interval scaled; Dissimilarity of Numeric Data: Minskowski Distance Euclidean distance and Manhattan distance Proximity Measures for Categorical, Ordinal Attributes, Ratio scaled variables; Dissimilarity for Attributes of Mixed Types, Cosine Similarity, partitioning methods- k-means, k-medoids
- Select activity DMW UNIT-IV Notes
  
  DMW UNIT-IV Notes URL
  
  This link has ppt of UNIT-IV:Cluster Analysis: Measuring Similarity & Dissimilarity
  
  DMW Videos Playlist: https://tinyurl.com/DMW-Bhaskar
Select section UNIT-V : Frequent Pattern Analysis

Collapse Expand
UNIT-V : Frequent Pattern Analysis
Syllabus:Market Basket Analysis, Frequent item set, closed item set & Association Rules, mining multilevel association rules, constraint based association rule mining, Generating Association
Rules from Frequent Item sets, Apriori Algorithm, Improving the Efficiency of Apriori, FP Growth Algorithm. Mining Various Kinds of Association Rules: Mining multilevel association rules, constraint based association rule mining, Meta rule-Guided Mining of Association Rules.
- Select activity DMW UNIT-V Notes
  
  DMW UNIT-V Notes URL
  
  This link has ppt of UNIT-V:Freaquent Pattern Analysis.
  
  DMW Videos Playlist: https://tinyurl.com/DMW-Bhaskar
Select section UNIT-VI: Classification

Collapse Expand
UNIT-VI: Classification
Syllabus:Introduction, classification requirements, methods of supervised learning, decision trees- attribute selection, tree pruning, ID3, scalable decision tree techniques, rule extraction from decision tree, Regression, Bayesian Belief Networks, Training Bayesian Belief Networks, Classification Using Frequent Patterns, Associative Classification, Lazy Learners-k-Nearest-Neighbour Classifiers, Case-Based Reasoning, Multiclass Classification, Metrics for Evaluating Classifier Evaluating the Accuracy of a Classifier.
- Select activity DMW UNIT-VI Notes
  
  DMW UNIT-VI Notes URL
  
  This link has ppt of UNIT-VI: Classification.
  
  DMW Videos Playlist: https://tinyurl.com/DMW-Bhaskar

DMW LAB Evaluation Rubrics:

Rubrics for Assessment of Data Mining & Warehousing Lab

Evaluation of practical assignment is based on the following criteria’s. Each Assignment is evaluated out of 10 Marks

Criteria	Excellent	Good	Average	Poor
Write Ups (2)	Timely submission within deadline in all respects. (2)	Timely submission but needs some improvement. (1)	Submission with maximum one-week delay. (1)	Delayed in submission or found copied. (0)
Understanding (4)	Understand all the concepts, algorithm, or logic. (4)	Understand the concepts, algorithm, or logic but need improvement. (3)	Limited understanding of the concepts or algorithm or logic but need more improvement (2)	Failed to understand the concepts, algorithm, or logic. (0)
Performance (4)	Implemented the concepts, algorithm, or logic with correct expected output considering test cases. ( 4)	Implemented the concepts, algorithm, or logic with expected results. (3)	Implemented the concepts, algorithm, or logic with partial results and needs improvement. (2-1)	Not implemented and no output. (0)

Select section DMW Lab Softwares

Collapse Expand
DMW Lab Softwares
- Select activity DMW Lab Softwares
  
  DMW Lab Softwares URL
  
  DMW Lab Softwares
  
  Anaconda Software Framework
Select section DMW LAB-1: Data Preprocessing PR Assignment

Collapse Expand
DMW LAB-1: Data Preprocessing PR Assignment
- Select activity Data Preprocessing Google Colab Link
  
  Data Preprocessing Google Colab Link URL
  
  Data Preprocessing Pr Assignment Google Colab Link
Select section DMW LAB-2:Visualize the clusters using suitable tool (WEKA)

Collapse Expand
DMW LAB-2:Visualize the clusters using suitable tool (WEKA)
Consider a suitable dataset. For clustering of data instances in different groups, apply different clustering techniques (minimum 2). Visualize the clusters using suitable tool.
- Select activity Implementation Details Link
  
  Implementation Details Link URL
  
  Implementation Details Link
  
  Data Set & Theory Details:
Select section DMW LAB -3: Decision Tree

Collapse Expand
DMW LAB -3: Decision Tree
- Select activity Decision Tree implementation with Colab & Weka
  
  Decision Tree implementation with Colab & Weka URL
  
  Decision Tree implementation with Colab & Weka
Select section DMW LAB -4:Associal Rules Implementation

Collapse Expand
DMW LAB -4:Associal Rules Implementation
Apply a-priori algorithm to find frequently occurring items from given data and generate strong association rules using support and confidence thresholds.For Example: Market Basket Analysis
- Select activity Association Rules Implementation Details
  
  Association Rules Implementation Details URL
  
  Association Rules Implementation Details
  
  Theory & Data Set LinkHANDSON Video Link:
Select section DMW LAB-5:-Classify text documents and evaluate precision, recall

Collapse Expand
DMW LAB-5:-Classify text documents and evaluate precision, recall
Consider a suitable text dataset. Remove stop words, apply stemming and feature selection techniques to represent documents as vectors.Classify documents and evaluate precision, recall.
- Select activity Implementation Details
  
  Implementation Details URL
  
  Implementation Details
  
  Theory & Dataset Link:
Select section DMW LAB Extra PR Assignment:Usage of Open source ETL Tool.

Collapse Expand
DMW LAB Extra PR Assignment:Usage of Open source ETL Tool.
For an organization of your choice, choose a set of business processes. Design star / snow flake schemas for analyzing these processes. Create a fact constellation schema by combining them. Extract data from different data sources, apply suitable transformations and load into destination tables using an ETL tool. For Example: Business Origination: Sales, Order, Marketing Process.
- Select activity This Link has LP-II(DMW PR Extra Asgmt) implementation Details.
  
  This Link has LP-II(DMW PR Extra Asgmt) implementation Details. URL
  
  DMW LAB :Usage of Open source ETL Tool.