Interactive Dimensionality Reduction Tool

This JavaScript based tool is developed to prove the concept described in paper "Interactive Visual Data Exploration with Subjective Feedback". The paper is accepted by ECML-PKDD'16, conference track.

In this paper, we introduce a novel generic method for interactive visual exploration of high-dimensional data. In contrast to most visualization tools, it is not based on the traditional dogma of manually zooming and rotating data. Instead, the tool initially presents the user with an 'interesting' projection of the data and then employs data randomization with constraints to allow users to flexibly and intuitively express their interests or beliefs using visual interactions that correspond to exactly defined constraints. These constraints expressed by the user are then taken into account by a projection-finding algorithm to compute a new 'interesting' projection, a process that can be iterated until the user runs out of time or finds that constraints explain everything she needs to find from the data.

The tool operates in the following way:

We present the tool by means of two case studies, one controlled study on synthetic data and another on real census data.

Synthetic Dataset Case Study

In this case study, user will operate on a sub-sample (250 data points) of a synthetic dataset. The Synthetic Dataset consisting of 1000 10-dimensional data vectors of which dimensions 1-4 can be clustered into five clusters, dimensions 5-6 into four clusters involving different subsets of data points, and of which dimensions 7-10 are Gaussian noise. All dimensions have equal variance. The sub-sampled dataset is zscored. It takes about 20 seconds to load the case study.

UCI Adult Dataset Case Study

In this case study, user will use the tool to explore a real census dataset. The dataset is compiled from UCI Adult Dataset . It consists of 218 sub-sampled data points and nine attributes: "Age" (integer, 17-19), "Education" (integer, 1-16), "HoursPerWeek" (integer, 1-99), "EG_White" (binary, {"No" = 0, "Yes" = 1}), "EG_AsianPacIlander" (binary), EG_Black" (binary), "EG_Other" (binary), "Gender" (binary, {"Female" = 0, "Male" = 1}), and "Income" (binary, {"<50k" = 0, ">50k" = 1}), where "EG_" stands for "Ethnic Group". The sub-sampled dataset is zscored. It takes about 15 seconds to load the case study.

You can download the R implement of our tool, as well as the run time experiment (paper Sec. 3.3) via this link.