Multimodal Image Retrieval using PLSA and Microstructure Descriptor

Choiru Zain, Mahardhika Pratama


PLSA (Probabilistic Latent Semantic Analysis) and SIFT (Scale Invariant Feature Transform) are widely used techniques that have been known as state of the art of multimodal image retrieval. However, for a gray-scale image, SIFT produces a big number of keypoints, where each keypoint has a 128 dimensions feature vector. SIFT does not store any information about the image color. This leads to an enormous amount of descriptors especially when it is applied in a big database like Flickr. On the other hand, Micro Structure Descriptor (MSD) represents a full color image as a 72 dimensions feature vector. Furthermore, MSD comprises the information about colors, textures and shapes.

This paper presents a PLSA based multimodal image retrieval system using MSD feature extraction algorithm. In the evaluation we compare our proposed system to PLSA based multimodal image retrieval system using SIFT feature extraction algorithm. The extensive experiment results show that PLSA-MSD image retrieval system is more efficient than PLSA-SIFT, accounted for 300% faster in terms of computational speed. The results imply that PLSA-MSD is suitable for big databases.


Image Retrieval; Multimodal Data; PLSA; MSD

Full Text:



Datta, R., Joshi, D., Li, J., & Wang, J. Z.,"Image retrieval: Ideas, influences, and trends of he new age". ACM Computing Surveys (CSUR),40(2), 2008.

Brahmi, D. and Ziou, D., 2004, May. Improving CBIR systems by integrating semantic features. In Computer and Robot Vision, 2004. Proceedings. First Canadian Conference on (pp. 233-240). IEEE.

A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain., "Content-based image retrieval: the end of the early years". In IEEE Trans. on Pattern. Analysis and Machine Intelligence, 22(12):1349–1380, 2000.

Lienhart, Rainer, Stefan Romberg, and Eva Hörster., "Multilayer pLSA for multimodal image retrieval." In Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, 2009.

Lowe, D. G., "Distinctive image features from scale-invariant keypoints". In International Journal of Computer Vision, 60(2):91-110, 2004.

G-H Liu, Z-Y Li, L. Zhang, Y. Xu., “Image Retrieval Based on Micro-structured Descriptor”, Elsevier Pattern Recognition 44:2123-2133, 2011.

Hofmann, T., "Unsupervised learning by probabilistic latent semantic analysis". In Machine learning, 42(1-2):177-196, 2001.

Monay, F., "Learning the structure of image collections with latent aspect models", Doctoral Dissertation, Ecole Polytechnique Fédérale de Lausanne, 2007.

Manning, C.D., Raghavan, P. and Schütze, H., 2008. Introduction to information retrieval (Vol. 1, p. 496). Cambridge: Cambridge university press.ImageCLEF, “Segmented and Annotated IAPR TC-12 Dataset”,, August 2014.

Zhang, D., Islam, M.M. and Lu, G., 2012. A review on automatic image annotation techniques. Pattern Recognition, 45(1), pp.346-362.

Blei, D.M., Ng, A.Y. and Jordan, M.I., 2003. Latent dirichlet allocation. the Journal of machine Learning research, 3, pp.993-1022.

Hofmann, T., 1999, July. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc..

Monay, F. and Gatica-Perez, D., 2004, October. PLSA-based image auto-annotation: constraining the latent space. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 348-351). ACM.

Liu, G.H., Li, Z.Y., Zhang, L. and Xu, Y., 2011. Image retrieval based on micro-structure descriptor. Pattern Recognition, 44(9), pp.2123-2133.

Treisman, A.M. and Gelade, G., 1980. A feature-integration theory of attention. Cognitive psychology, 12(1), pp.97-136.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.