Retrieving Online Visual Propaganda

Domain Specific Large Scale Semantic Image Retrieval

The web has evolved to become Retrievalone of the largest depositories of data. This data is available in many forms; text, audio, images and video. One of the greatest contributors to the enormous expansion of these data depositories is social media websites. A major drawback of dumping information data on the web by social media platforms is the lack of standard format and the absence of database organization or management. The problem has been compounded with the difficulty of retrieving documents using their contents. Semantic technique may be a good option for retrieving text documents using their semantic bag of words, however, for an image, audio or video documents the problem becomes even more difficult. The direct method to retrieve social media images is using their original tags. Keyword search for tags may not be efficient and semantic search may fulfill some image requirements. However, the semantic search problem may become more severe as precision rates are lowered when users’ intension indicates different meaning for similar images. In this project, we propose a domain specific semantic process on large scale image retrieval that not just permits refined domain information to reduce the gap between the social media image tags and the users’ intent but also avoid losses due to concept mixing in conventional image retrieval systems. In this effort two domain specific algorithms to build tag-domain sets for social media images based on a topic model are designed and applied to improve information retrieval performance. Bag of visual words is combined with the semantic process to exclude un-related social media images. Several experiments were performed on both F-Measure and NDCG@K indicating that the proposed method significantly improves effectiveness of social image retrievals.

PI and Contact Person: Saeid Belkasim (sbelkasim@gsu.edu)