Archive for the ‘FAQ’ Category.

How do I use a Classificator?

Try this example:

ClassificationDataSet dataSet = DataSetFactory.IRIS();
 
SVMClassifier svm = new SVMClassifier();
 
svm.train(dataSet);
svm.predict(dataSet);
 
dataSet.showGUI();

If you get an error, you probably need LibSVM in your classpath.

How do I do a market basket analysis?

First, you create a RelationalDataSet and add all purchases to separate RelationalSamples:

RelationalDataSet dataSet = new RelationalDataSet();
 
RelationalSample s = new RelationalSample("OrderID 1234");
s.addObject("Product 1");
s.addObject("Product 2");
s.addObject("Product 3");
s.addObject("Product 4");
dataSet.getSamples().add(s);
 
s = new RelationalSample("OrderID 1235");
s.addObject("Product 1");
s.addObject("Product 2");
s.addObject("Product 3");
dataSet.getSamples().add(s);
 
s = new RelationalSample("OrderID 1236");
s.addObject("Product 1");
s.addObject("Product 6");
s.addObject("Product 7");
dataSet.getSamples().add(s);
 
s = new RelationalSample("OrderID 1237");
s.addObject("Product 7");
s.addObject("Product 2");
s.addObject("Product 3");
s.addObject("Product 8");
dataSet.getSamples().add(s);
 
s = new RelationalSample("OrderID 1238");
s.addObject("Product 7");
s.addObject("Product 4");
s.addObject("Product 3");
s.addObject("Product 8");
dataSet.getSamples().add(s);

After that, you do the analysis and visualize the results:

MarketBasketAnalysis mba = new MarketBasketAnalysis();
mba.setMinSupport(2);
dataSet.showGUI();
RelationalDataSet result = mba.calculate(dataSet);
result.showGUI();

Is distributed computing possible with JDMP?

JDMP was intended to handle large amounts of data efficiently. Therefore, distributed computing has always been a design goal of our software. JDMP separates calculation methods (Algorithm) and data structures (Matrix, Variable, Sample, DataSet), which can be combined into containers (Module). All of these objects can be declared as “global” which means, that the objects are shared in a computer network. Then, for example, you can request data (a Matrix) from one Variable (or a Sample from a DataSet), do some processing with it (with an Algorithm) and write it to another Variable or DataSet. The Variables could be stored on two different computers, but for the programmer it makes no difference if a Variable is global or not. This concept makes it easy to create distributed algorithms, you just have to think about the Variables that are needed and the processing steps that have to be executed. I guess this is the biggest difference compared to other toolkits like Weka4WS, where you cannot split the algorithm itself.

JDMP uses JGroups (http://www.jgroups.org) to share data in a computer network. Right now, only global Variables are supported. Distributed Samples, DataSets, Modules and Algorithms must still be developed.

What analysis methods are supported?

JDMP has most analysis methods for classification tasks: We support Naive Bayes, Neural Networks and classification trough regression directly, Support Vector Machine through the LibSVM library and many more classifiers through the Weka machine learning package.
Bayesian Networks, clustering and optimization methods are still experimental.

What is the script language of JDMP?

JDMP was intended to be a Java library, so we do not support any script language like R or Splus right now. However, we are thinking about including a Matlab-like syntax in one of the next releases. At the moment, it is possible to execute functions in Matlab, Octave or R directly from JDMP. S or Splus is not supported.

Can I use JDMP with Java 1.4?

No, unfortunately not. JDMP is designed for Java 6 and uses generics and other features untroduced in Java 5.0. Without modifications, it will not run with lower versions.

Is there a version for C/C++ or another programming language?

No, JDMP is available only for Java. However, if you are interested in porting the software to another language, you are encouraged to do so.