projects:bouquinoscope printable version

What is Bouquinoscope ?

The idea behind this project is to use some code developed for my PhD thesis (some code for mining Web image search engine, e.g. Google Image, PicasaWeb, Flickr, PicSearch etc. ) to build a book cover image searcher, and provide a book cover viewer, i.e. a Web application allowing, with a query composed of few words (a keywords-based query), to retrieve the image of a book cover.

This is achieved using AWS (Amazon Web Service) and Google Image considering that with a “good query” a relevant image is in the 20 first results (like we search an image related to a named entity, i.e. an author name or a book title).
The users of the Bouquinoscope web application can build a profile which contains a book cover collection.

The data contained in the profile is stored using a dedicated XML schema that embeds the Dublin Core schema for the bibliographic references. For the export of the data (in order to build an XHTML view of the profile), the XML Dublin Core element are rewritten in RDFa.

The project is available on my SVN: http://svn.trevize.net/Bouquinoscope, or a WAR archive is also available here (authentification is required, email me).

Testing it

How it works?

With a servlet, making a Google Image search query or making some calls to AWS, parse the results page, fetching images, copying them on a web repository, build a result page and giving a way to the user to choose one image, and adding it to his profile. The user can also remove an image from his profile.

A profile is a list of images (books covers) on the server side, it's an XML file containing the list of images and captions (title and author of the book), these XML files abide the imageLibrary.xsd XSD, this schema embedded the Dublin core XSD schema, each item is referenced via an XML Dublin Core element (see http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/).

For displaying a profile, the data is exported in a XHTML+RDFa fragment.

The current version of Bouquinoscope could be test here: http://webapps.trevize.net/Bouquinoscope/Bouquinoscope.

The export functionality allows to include the profile in the web page of another website. This functionality allows some GET parameters, for instance: http://webapps.trevize.net/Bouquinoscope/Bouquinoscope?action=export&profile=admin&nbcol=8.

In the XHTML code for the exportation, references are made to a javascript tooltip http://www.walterzorn.com/tooltip/tooltip_e.htm#download that allow the displaying of DublinCore metadata (creator, title), and a good way for including the profile in a WEB page with PHP is:

<div id="content">
<div id="bouquinoscope">
<?php $file = file_get_contents('http://webapps.trevize.net/Bouquinoscope/Bouquinoscope?action=export&profile=admin'); echo $file;?>
</div>
</div>

Development track

  • bug while exporting the last item (html ID parameter defined twice): change the ID just for the “exported last item”.
  • export only the last item.
  • ajouter un paramètre supplémentaire pour lister en ordre chronologique inverse.
  • RDFa pour les méta dublin core et passer la validation xhtml.

Older version notes

v0.4

2009.10.14 @ 8:58PM
Due to the update of the AWS Advertising API (all calls to the Product Advertising API must be authentificated using request signatures), the project was no longer working until tonight.

I worked on it this evening and written a new AWSBookCoverFetcher class, using the Product Advertising API Signed Requests Sample Code - Java REST/QUERY here.

The project is ok again, the next works are on the code (making it more extendable, for the moment it's not easy to enter in it), and I think on a use of RDFa for the DublinCore information, some parameters for the exportation (the order of the book list), the date of the inclusion of a book etc.

v0.3

2008.10.29 @ 2:53PM
Amazon Web Services is now used to retrieve images.
I can produce DublinCore metadata in a DublinCore RDF annotated XHTML document, but I use for now an XML version of DublinCore.

A simple way to add DublinCore metadata to XHTML is to use the HTML META markup in the HEAD section, but such metadata is for the XHTML document in his completeness. What I want is to associate DublinCore metadata to an HTML DIV tag.
the project use for now an XML-Binding of the DublinCore XSD grammar, see http://dublincore.org/schemas/xmls/simpledc20021212.xsd, and I include an XML fragment representing a DublinCore metadata to the page, for instance a way to include the fragment associated to an image (a book cover) in an HTML result page could be:

<td class="thumbnail_outer" id="id_41JVJTDKN6L._SL160_.jpg">
<img class="thumbnail" src="http://webapps.trevize.net/Bouquinoscope/profiles/admin/41JVJTDKN6L._SL160_.jpg" alt=""onmouseover="dispMeta('id_41JVJTDKN6L._SL160_.jpg')" onmouseout="UnTip()" />
<div class="docmeta">
<ns3:dublinCoreT xmlns:ns2="http://purl.org/dc/elements/1.1/" xmlns:ns3="http://trevize.net/abirproto">
    <ns2:title>La Course au mouton sauvage</ns2:title>
    <ns2:creator>Haruki Murakami</ns2:creator>

</ns3:dublinCoreT></div></td>

Note that another way could be to produce RDF-DublinCore and using an XSL to transform RDF to XML, but for now I use XML-DublinCore.

I add a Javascript tooltip, using an hidden DIV for the metadata, and when the mouse cursor is over a thumbnail (a book cover image), using some Javascript code to retrieve under which thumbnail the mouse is, retrieve the content of the hidden DIV and display the tooltip.

The tooltip used is: http://www.walterzorn.com/tooltip/tooltip_e.htm#download.

An export functionality has been added permitting to include the profile I an web page of another website. This functionality allows some GET parameters, for instance: http://webapps.trevize.net/Bouquinoscope/Bouquinoscope?action=export&profile=admin&nbcol=8

v0.2

2008.07.18 @ 10:05AM
I've written a WWWResourceDownloaderPool, which manage a pool of WWWResourceDownloaderThread with the managing of concurrency like a ProdCon: the producer is the thread who makes “passing an order of download” and the consumer is the pool manager.

I've experienced some unexplained behaviours in Java, the FileImageOutputStream because of the cache I think, I don't know why but the reading of images files is not done correctly later in the code, after their writings by a FileImageOutputStream object… maybe due to the JVM which makes caching for images… I've tried to load images with an Image or ImageIcon object but without success:

ImageIcon icon = new ImageIcon(i);
int image_status = icon.getImageLoadStatus();
switch(image_status){
 case MediaTracker.ABORTED:
  System.out.println("MediaTracker.ABORTED");
  break;
 case MediaTracker.COMPLETE:
  System.out.println("MediaTracker.COMPLETE");
  break;
 case MediaTracker.ERRORED:
  System.out.println("MediaTracker.ERRORED");
  break;
 case MediaTracker.LOADING:
  System.out.println("MediaTracker.LOADING");
  break;
}
int width = icon.getIconWidth();
int height = icon.getIconHeight();

However, using JIU for loading images works, all caches seems to be flushed correctly… I can't explain that…

The ImageLibrary is now used only for the profile.

Multi-profile is implemented, but not multi-user (i.e. any asks of password while creating a new profile), but it can be made by modifying directly the configuration file.

Some Javascript functions is added to support add image to profile after a query, and remove image from profile from the profile page.

v0.1

2008.07.17 @ 7:08PM
Query and the retrieval of images is done, browsing the results page too.
It's a little quirks cause I use some code developed for my thesis (and containing code for extracting image descriptors, this code could be rewritten with a better splitting and more modularity…).
There's a thing I can't explain: why firefox change URL encoded %27 in %2527 ?
HtmlParser have the same behavior…

TODO:

  1. use a pool of thread to retrieve images (foresee for the next version).
  2. do not use an ImageLibrary for the retrieved images (but only for the book cover list).
  3. finding a way to permitting the selecting of an image (for “add an image” action and “remove an image” action).

~~DISCUSSION~~

 
projects/bouquinoscope.txt · Last modified: 2010/12/02 13:53 by njames

 © Nicolas James 2009-2011

 Valid XHTML 1.0 Transitional Valid CSS! DokuWiki