Java Data Mining Package
A Library for Machine Learning and Big Data Analytics

Get it now – it's free!

The Java Data Mining Package (JDMP) is an open source Java library for data analysis and machine learning. It facilitates the access to data sources and machine learning algorithms (e.g. clustering, regression, classification, graphical models, optimization) and provides visualization modules. JDMP provides a number of algorithms and tools, but also interfaces to other machine learning and data mining packages (Weka, LibLinear, Elasticsearch, LibSVM, Mallet, Lucene, Octave).

In a Nutshell

Includes many machine learning algorithms
Easy interface for data sets and algorithms
Multi-threaded and lighting fast
Handle terabyte-sized data
Visualize and edit as heatmap, graph, plot
Treat every type of data as a matrix
TXT, CSV, PNG, JPG, HTML, XLS, XLSX, PDF, LaTeX, Matlab, MDB
Free and open source (LGPL)

Tell me more

Quick Links

Download Documentation

Screenshots

Iris Dataset

This screenshot shows the visualization of the Iris flower data set in JDMP. JDMP's Naive Bayes classifier has been trained on the data was used afterwards to predict the target class of the samples.

Animal Dataset

This screenshot shows a dataset of animals with their properties like "can fly" or "has feathers". The dataset has been clustered using a self-organizing map.

Workspace

This screenshot shows the workspace of the Java Data Mining Package.

Features

The main focus of JDMP lies on a consistent data representation. Maybe you’ve heard that, for Linux everything is a file. For JDMP, everything is a Matrix! Well, not everything, but many objects do have a matrix representation. For example: you can combine several matrices to form a Variable, e.g. for a time series. You can access these matrices one by one, or as a single big matrix, whatever is more suitable for your task. Several Variables are combined into a Sample, like the samples with input and target values in a classification task. Many Samples form a DataSet, which may be sorted or split for a cross validation test. The DataSet can be accesses either sample by sample or as a big matrix for the input features and one for the target values.

Algorithms can manipulate Variables, Samples or DataSets, e.g. to perform pre-processing or a classification task. It has to be emphasized that, in JDMP, data processing methods are separated from data sources, so that algorithms and data may reside on different computers and parallel processing becomes possible. However, distributed computing is not yet fully implemented and exists in a “proof of concept” version only.

While some parts are pretty stable by now, a lot of development is still going on in other parts, which is why JDMP has to be considered as experimental and not yet ready for production use.

Too good to be true?

Well, to be perfectly honest with you, there are some things you will not like about JDMP, so here is the list why you may not want to use it:

The Java Data Mining Package is...

...not finished
While some parts are already pretty stable, a lot of development is still going on in other areas. There is no guarantee that everything will be working as expected. Be prepared for renamed methods and interfaces from one version to the next.
...not well documented
Since many things are going to change anyway, we didn't bother to write much documentation.

Wanna help?

Developers are welcome to contribute new features, test cases or documentation.

If you like this library and it does something useful for you, a small donation will be very much appreciated to help cover server costs and ensure the coffee supply for further development.

Thank you very much for supporting open source software!

Donate via PayPal

Download

Include via Maven

The easiest way to add JDMP to your projects is to include it via Maven. You will need at least the jdmp-core package which contains the basic algorithm classes and data sets. Add these lines to the <dependencies> section in your pom.xml file:

<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-core</artifactId>
    <version>0.3.0</version>
</dependency>

The jdmp-gui package is useful when you want to display datasets or algorithms on the screen:

<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-gui</artifactId>
    <version>0.3.0</version>
</dependency>

Other dependencies can be added as required using the appropriate sub-packages of JDMP. Here is the full list:

<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-core</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-gui</artifactId>
    <version>0.3.0</version>
</dependency>

<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-bsh</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-complete</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-corenlp</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-examples</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-jetty</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-liblinear</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-libsvn</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-lucene</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-mallet</artifactId>
    <version>0.3.0</version>
</dependency>
<dependency>
    <groupId>org.jdmp</groupId>
    <artifactId>jdmp-weka</artifactId>
    <version>0.3.0</version>
</dependency>

Download JAR Packages

If you don't have Maven, you can download a jar file containing all jdmp packages here: jdmp-complete.jar

This package contains jdmp-core, the main part of the Java Data Mining Package, as well as additional features such as visualization and interfaces to other libraries, and a lot more. If you are new to this library, just invoke the main() method in org.jdmp.gui.JDMP:

java -jar jdmp-complete.jar

Then click on "Tools - JDMP Plugins" in the menu bar and see what third party libraries are supported. Chose the tool you want, add the necessary dependencies to the class path and restart JDMP.

Download Source Code from GitHub

The source code of JDMP is available from GitHub. You can clone the repository like this:

git clone https://github.com/jdmp/java-data-mining-package.git

This is the link to the repository: JDMP on GitHub

You are welcome to contribute, just send me a pull request on GitHub.

Documentation

Quick Start

// load example data set
ListDataSet dataSet = DataSet.Factory.IRIS();

// create a classifier
NaiveBayesClassifier classifier = new NaiveBayesClassifier();

// train the classifier using all data
classifier.trainAll(dataSet);

// use the classifier to make predictions
classifier.predictAll(dataSet);

// get the results
double accuracy = dataSet.getAccuracy();

System.out.println("accuracy: " + accuracy);

Self-Organizing Map

// load example data set
ListDataSet dataSet = DataSet.Factory.ANIMALS();

// create a self-organizing map
SelfOrganizingMap som = new SelfOrganizingMap();

// train the SOM using all data
som.trainAll(dataSet);

// use the SOM to make predictions
som.predictAll(dataSet);

// display dataset on the screen
dataSet.showGUI();

Show the Workspace

org.jdmp.gui.JDMP.main(args);

List of JDMP Modules

`jdmp`	Parent package with just the master .pom file without any Java code	Module Info
`jdmp-bsh`	Plugin to incorporate BeanShell	Module Info API Docs
`jdmp-complete`	Collection of all available JDMP modules in one meta package	Module Info API Docs
`jdmp-core`	Main package of JDMP containing machine learning algorithms and functions	Module Info API Docs
`jdmp-corenlp`	Plugin to incorporate algorithms from Stanford CoreNLP	Module Info API Docs
`jdmp-examples`	Some simple examples how to use the Java Data Mining Package	Module Info API Docs
`jdmp-gui`	Plugin to enable visualization and graphics	Module Info API Docs
`jdmp-jetty`	Plugin to incorporate Jetty web server	Module Info API Docs
`jdmp-liblinear`	Plugin to incorporate classification algorithms from liblinear	Module Info API Docs
`jdmp-libsvm`	Plugin to incorporate classification algorithms from c	Module Info API Docs
`jdmp-lucene`	Plugin to enable text indexing using Apache Lucene	Module Info API Docs
`jdmp-mallet`	Plugin to incorporate text mining algorithms from Mallet	Module Info API Docs
`jdmp-weka`	Plugin to incorporate classification and clustering algorithms from Weka	Module Info API Docs

Universal Java Matrix Package

JDMP uses the Universal Java Matrix Package (UJMP) as a mathematical back-end for matrix calculations. UJMP provides most of JDMP's import and export filters and is used for visualization. Since most of JDMP's objects can be converted into a "matrix view", UJMP is a very important building block in JDMP and helps to keep the code nice and simple with the ability to handle very large matrices even when they do not fit into memory. Import and export interfaces are provided for JDBC data bases, TXT, CSV, Excel, Matlab, Latex, MTX, HTML, WAV, BMP and other file formats.

Tell me more

Contributors

Holger Arndt

Project manager, JDMP core package, JDMP GUI package, interfaces to other libraries
Holger's Homepage

Markus Bundschus

Strategical consultancy, marketing
Markus's Homepage

Andreas Nägele

Naive Bayes Classifier, test cases

Java Data Mining Package
A Library for Machine Learning and Big Data Analytics

In a Nutshell

Quick Links

Screenshots

Iris Dataset

Animal Dataset

Workspace

Features

Too good to be true?

Wanna help?

Download

Include via Maven

Download JAR Packages

Download Source Code from GitHub

Documentation

Quick Start

Self-Organizing Map

Show the Workspace

List of JDMP Modules

Universal Java Matrix Package

Contributors

Holger Arndt

Markus Bundschus

Andreas Nägele

Holger Arndt

Copyright

Privacy Policy

I. Information about us as controllers of your data

II. The rights of users and data subjects

III. Information about the data processing

Server data

Cookies

a) Session cookies

b) Third-party cookies

c) Disabling cookies

Contact

Social media links via graphics

twitter

LinkedIn

Google AdSense

Java Data Mining Package A Library for Machine Learning and Big Data Analytics

In a Nutshell

Quick Links

Screenshots

Iris Dataset

Animal Dataset

Workspace

Features

Too good to be true?

Wanna help?

Download

Include via Maven

Download JAR Packages

Download Source Code from GitHub

Documentation

Quick Start

Self-Organizing Map

Show the Workspace

List of JDMP Modules

Universal Java Matrix Package

Contributors

Holger Arndt

Markus Bundschus

Andreas Nägele

Java Data Mining Package
A Library for Machine Learning and Big Data Analytics