About

Java Data Mining Package

The Java Data Mining Package (JDMP) is an open source Java library for data analysis and machine learning. It facilitates the access to data sources and machine learning algorithms (e.g. clustering, regression, classification, graphical models, optimization) and provides visualization modules. It includes a matrix library for storing and processing any kind of data, with the ability to handle very large matrices even when they do not fit into memory. Import and export interfaces are provided for JDBC data bases, TXT, CSV, Excel, Matlab, Latex, MTX, HTML, WAV, BMP and other file formats. JDMP provides a number of algorithms and tools, but also interfaces to other machine learning and data mining packages (Weka, LibSVM, Mallet, Lucene, Octave).

The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation. For example: you can combine several matrices to form a Variable, e.g. for a time series. You can access these matrices one by one, or as a single big matrix, whatever is more suitable for your task. Several Variables are combined into a Sample, like the samples with input and target values in a classification task. Many Samples form a DataSet, which may be sorted or split for a cross validation test. The DataSet can be accesses either sample by sample or as a big matrix for the input features and one for the target values.

Algorithms can manipulate Variables, Samples or DataSets, e.g. to perform pre-processing or a classification task. It has to be emphasized that, in JDMP, data processing methods are separated from data sources, so that algorithms and data may reside on different computers and parallel processing becomes possible. However, distributed computing is not yet fully implemented and exists in a “proof of concept” version only.

Screenshot

Visualization of a Data Set in JDMP

Universal Java Matrix Package

JDMP uses the Universal Java Matrix Package (UJMP) as a mathematical back-end for matrix calculations. UJMP provides most of JDMP’s import and export filters and is used for visualization. Since most of JDMP’s objects can be converted into a “matrix view”, UJMP is a very important building block in JDMP and helps to keep the code nice and simple.

Developers Wanted!

We are still seeking for developers to help us improve UJMP and JDMP. The most pressing things right now are documentation and JUnit test cases, but we also need programmers for calculations, import/export filters or improving the GUI. If you would like to participate, please don’t hesitate to contact us!

What’s New


  • del.icio.us
  • Google Bookmarks
  • MisterWong
  • Facebook
  • DZone
  • Slashdot
  • Digg
  • Technorati
  • StumbleUpon
  • E-mail this story to a friend!
  • Print this article!

Comments are closed.