Principles of Data Mining
Abstract
This book is designed to be suitable for an introductory course at either undergraduate
or masters level. It can be used as a textbook for a taught unit in
a degree programme on potentially any of a wide range of subjects including
Computer Science, Business Studies, Marketing, Artificial Intelligence, Bioinformatics
and Forensic Science. It is also suitable for use as a self-study book for
those in technical or management positions who wish to gain an understanding
of the subject that goes beyond the superficial. It goes well beyond the generalities
of many introductory books on Data Mining but—unlike many other
books—you will not need a degree and/or considerable fluency in Mathematics
to understand it.
Mathematics is a language in which it is possible to express very complex
and sophisticated ideas. Unfortunately it is a language in which 99% of the human
race is not fluent, although many people have some basic knowledge of it
from early experiences (not always pleasant ones) at school. The author is a former
Mathematician who now prefers to communicate in plain English wherever
possible and believes that a good example is worth a hundred mathematical
symbols.
One of the author’s aims in writing this book has been to eliminate mathematical
formalism in the interests of clarity wherever possible. Unfortunately
it has not been possible to bury mathematical notation entirely. A ‘refresher’
of everything you need to know to begin studying the book is given in Appendix
A. It should be quite familiar to anyone who has studied Mathematics
at school level. Everything else will be explained as we come to it. If you have
difficulty following the notation in some places, you can usually safely ignore
it, just concentrating on the results and the detailed examples given. For those
who would like to pursue the mathematical underpinnings of Data Mining in
greater depth, a number of additional texts are listed in Appendix C.