projects:datasetexplorer printable version

DatasetExplorer

DatasetExplorer is a Java application for exploring and searching in large collections of annotated images, and is based on the Galatee library (a library developed for bringing fast, convenient, easily usable GUI components for browsing-searching image collections and annotated images).

The DatasetExplorer application (as its name indicates it) is mainly dedicated to explore learning dataset in the framework of automatic image annotation.

Feel free to send me an email at nicolas.james@gmail.com if you are interested by this project.

Features

  • a dataset can be represented by:
    • a directory (with eventually sub-directories): As for instance Caltech 101 (http://www.vision.caltech.edu/Image_Datasets/Caltech101/) or the University of Washington Image Database (http://www.cs.washington.edu/research/imagedatabase/).
    • a TAR archive: As for instance in the ImageNet image database.
      The tar archive is not unpacked, the Galatee library uses Apache Commons VFS for getting data directly from the tar file.
    • a text file that contains filepath to images: the file can contain relative paths, in this case you have to specify a path prefix.
      Relative paths are very useful if your dataset is on an external hard-drive or if you work on several machines and the location of a dataset is not the same between the machines.
      A very important thing about this kind of dataset format is that you can add textual annotations in the file, here is an example of such a file: it is the two first images that populate the concept 042_Castle in LSCOM annotation v1.0. Each image is annotated by the other LSCOM concepts for which the image is an instance:
      TRECVID2005_145/shot145_102_RKF.jpg,140_Steeple,107_Standing,101_Urban_Park,224_Outdoor,434_Tower,208_Urban_Scenes,181_Adult,363_Pavilions,361_Overlaid_Text,290_Daytime_Outdoor,210_Animal,400_Runway,316_Group,226_Building,042_Castle
      TRECVID2005_148/shot148_491_RKF.jpg,153_Landscape,309_Free_Standing_Structures,442_Sidewalks,235_Vegetation,224_Outdoor,072_Graveyard,115_Commercial_Advertisement,361_Overlaid_Text,290_Daytime_Outdoor,400_Runway,435_Trees,226_Building,042_Castle
    • a text file that contains URI to images: the files are downloaded in a temporary directory. As previously, you can specify an URI prefix, so the file can contains relative paths.
      As previously you can add textual annotations in the text file (with the same format that previously).
    • a location that contains an instance of an IIDF model.
    • a PascalVOC dataset.
  • visualization of a list of images (with associated metadata): images are referenced by a URI object. Schemes of the URI can be file for a local image file, or http for an image file accessible via HTTP,
  • textual search in the image list (based on the value of the image URI and on the image description) using a Lucene in-memory index,
  • downloading, in-memory loading and resizing of an image is made only when it's necessary (something like a load when you see),
  • multi-threading for the downloading, loading, resizing of the images,
  • configuration via the properties files DatasetExplorer.properties and Galatee.properties:
    1. customize the item visualization (image size, text area size),
    2. cache directory for downloaded files.

Download

The DatasetExplorer application (and some libraries used by it) is built on a GNU/Linux platform but it should work also on the windows platform (if not, drop me a line at nicolas.james@gmail.com about any problem you encounter).

However, the application use some external applications (like those of the ImageMagick project) on a GNU/Linux system (through a special class called SystemCommandHandler2 which is only available for GNU/Linux system), so some features are only available on GNU/Linux and Unix systems.

Installation

  1. download the archive and unpack it.
  2. edit the file Galatee.properties and update the value of PROPERTY_DEFAULT_TEMP_DIRECTORY, it must be the path to a temporary directory.
    :!: On the MS-Windows platform the antislash character in a java properties file has to be escaped, i.e. a double antislash instead of one antislash.
  3. under a GNU/Linux platform: update the Shell script DatasetExplorer.sh, line 4, with the path where you have installed DatasetExplorer, and launch this script.
    under a windows platform (if you are not lucky): launch the batch script DatasetExplorer.bat.

Plugins

OpenCV Haar Classifier Cascade plugin

OpenCV Face and Eye Detector plugin

Image Datasets & Databases browsing

ImageNet

In the DatasetExplorer: File > Load Dataset, then use the TAR archive explorer option, select the archive file and open it.

PascalVOC

Using the DatasetExplorer application

  • Since the version DatasetExplorer-1.0-2010-12.01 visualizing a PascalVOC dataset is an option of DatasetExplorer (available in the File menu > Load dataset, select the tab PascalVOC explorer, specify a directory that contains the JPEGImages and Annotations directories of a PascalVOC datasets and click the Open button).

    :!: The very first time you open a PascalVOC dataset with DatasetExplorer, the indexing process can take several minutes (the time to parse all the XML annotation files), but the index is written in the temporary directory of DatasetExplorer and the next times this operation will not be performed. So: do not forget to well inform the Galatee.properties file for indicating the path to the temporary directory.
  • A small code for generating a listing of the annotated images from the datasets of the PascalVOC challenge: PascalVOC-0.1-2010.06.23.jar.
    However this code works only since the challenge of 2007, i.e. since annotations are available in an XML format.
    How it works:
    1. Update the PascalVOC.properties file with your configuration (i.e. with your installation path of the PascalVOC data)
      Don't forget to escape the antislash character in this file if you're using a MS-Windows platform.
    2. Execute the method PascalVOCStarter.createAnnotatedImagesListing().
    3. Use the DatasetExplorer application to read the file created at the step 2, using the “local filepath list explorer” option and using a prefix which is the path to the JPGImages directory of your PascalVOC data.

Implementing a lite browser using the Galatee anf Jmagine library

  • A example for displaying the data of the PascalVOC challenges (since the 2007 challenge, i.e. since annotation files are XML files) using the Galatee library for browsing in the data, and the Jmagine library for displaying images with their annotated polygons: PascalVOC-0.2-2010.07.01.tar.gz

License GPL

DatasetExplorer uses the following libraries:

  • Apache Batik, commons-vfs, commons-codec, commons-httpclient, commons-io, commons-lang, commons-logging, log4j, lucene-core: under Apache license
  • JUI: under GPL license
  • netbeans swing outline: under GPLv2 license
  • some of my libraries: Galatee, Tinker, Jmagine, FSExplorer: under GPLv2 license
  • conja
  • JAI: Java Research License (JRL)

DatasetExplorer, a Java application for exploring collections of images in Java.
Copyright © 2009-2010, Nicolas James.
http://njames.trevize.net/wiki/projects:DatasetExplorer

DatasetExplorer is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
Galatee is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this Module; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.

Other screenshots

 
projects/datasetexplorer.txt · Last modified: 2011/07/07 11:20 by njames

 © Nicolas James 2009-2011

 Valid XHTML 1.0 Transitional Valid CSS! DokuWiki