ACM Journal of

Data and Information Quality (JDIQ)

Latest Articles

The Challenge of “Quick and Dirty” Information Quality

Data Quality Challenges in Distributed Live-Virtual-Constructive Test Environments

Information Quality Research Challenge

As information technology becomes an integral part of daily life, increasingly, people understand the world around them by turning to digital sources as opposed to directly interacting with objects in the physical world. This has ushered in the age of Ubiquitous Digital Intermediation (UDI). With the explosion of UDI, the scope of Information... (more)

Data Standards Challenges for Interoperable and Quality Data

Challenges for Context-Driven Time Series Forecasting

Predicting time series is a crucial task for organizations, since decisions are often based on uncertain information. Many forecasting models are... (more)

Combining User Reputation and Provenance Analysis for Trust Assessment

Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust.... (more)

Automatic Discovery of Abnormal Values in Large Textual Databases

Textual databases are ubiquitous in many application domains. Examples of textual data range from names and addresses of customers to social media... (more)


In a manner similar to most organizations, BigCompany (BigCo) was determined to benefit strategically from its widely recognized and vast quantities of data. (U.S. government agencies make regular visits to BigCo to learn from its experiences in this area.) When faced with an explosion in data volume, increases in complexity, and a need to respond... (more)


Jan. 2016 -- New book announcement


Carlo Batini and Monica Scannapieco have a new book:

Data and Information Quality: Dimensions, Principles and Techniques  

Springer Series: Data-Centric Systems and Applications, soon available from the Springer shop

The Springer flyer is available here

Special issue on Web Data Quality

The goal of this special issue is to present innovative research in the areas of Web Data Quality Assessment and Web Data Cleansing. The editors of this special issue are Christian Bizer, Xin Luna Dong, Ihab Ilyas, and Maria-Esther Vidal. See the call for papers for more details.



New options for ACM authors to manage rights and permissions for their work

ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights webpage.


ICIQ 2015, the International Conference on Information Quality, will take place on July 24 in Cambrigde, MA at the MIT.

Experience and Challenge papers: JDIQ now accepts two new types of papers. Experience papers describe real-world applications, datasets and other experiences in handling poor quality data. Challenges papers briefly describe a novel problem or challenge for the IQ community. See calls for papers for details.

Special Issue on Provenance and Quality of Data and Information: The term provenance refers broadly to information about the origin, context, derivation, lineage, ownership or history of some artifact. The provenance of data is more specifically a form of structured metadata that records the activities involved in data production. The notion applies to a broad variety of data types, from database records, to scientific datasets, business transaction logs, web pages, social media messages, and more. At the same time, different definitions and measures of quality apply to each of these data types, in different domains.

The JDIQ guest editors are Paolo Missier (Newcastle University, UK, and Paolo Papotti (Qatar Computing Research Institut, Qatar,

Forthcoming Articles

Replacing Mechanical Turkers? How to Evaluate Learning Results with Semantic Properties

Some machine learning algorithms offer more than just superior predictive power. They often generate additional information about the dataset upon which they were trained, providing additional insight into the underlying data. Examples of these algorithms are topic modeling algorithms such as Latent Dirichlet Allocation (LDA)~\cite{blei2003latent}, whose topics are often inspected as part of the analysis that many researchers do on their data. Recently deep learning algorithms such as word embedding algorithms like Word2Vec~\cite{mikolov2013distributed} have produced models with semantic properties. These algorithms are immensely useful; they tell us something about the environment from which they generate their predictions. One pressing challenge is how to evaluate the quality of the information produced by these algorithms. This evaluation (if done at all) is usually carried out via user studies. In the context of LDA topics, researchers ask human subjects questions and seeing how they understand different aspects of the topics~\cite{chang2009reading}. While this type of evaluation is sound, it is expensive both from the perspective of time and cost, and thus cannot be easily reproduced independently. These experiments have the additional drawback of being hard to scale up and difficult to generalize. We would like to pose this challenging question of evaluating the information quality of these semantic properties - could we find automatic methods of evaluating information quality as easily as we evaluate predictive power using accuracy, precision, and recall?

Challenges in ontology evaluation

Ontologies provide the semantics often as middleware for a number of Artificial Intelligence tools, and can be used to make logical assertions. Ontologies can define objects and the relationships among them in any domain-specific system. Finding logic errors in complete ontologies proves largely impossible for even the most widely-used reasoners. And logic is just one of numerous ways in which an ontology might be assessed. Therefore, we suggest that ontology evaluation is of limited value. Instead, we argue that the logical connections within ontologies should be tested while in development by tools such as Scenario-based Ontology Evaluation (SCONE). We would change present tools such that domain experts are able to make changes in the ontology without knowing ontology languages or description logic. And that ontology-based systems could allow fuzzy matching based on ontologies that might be imperfect.

Unifying Data and Constraint Repairs

Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data, by finding minimal or lowest cost changes to the data that make it consistent with the constraints. Such techniques are appropriate for the old world where data changes, but schemas and their constraints remain fixed. In many modern applications however, constraints may evolve over time as application or business rules change, as data is integrated with new data sources, or as the underlying semantics of the data evolves. In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired), or if the constraints have evolved (and the constraints should be repaired). In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing. We consider repairs over a database that is inconsistent with respect to a set of rules, modeled as functional dependencies (FDs). FDs are the most common type of constraint, and are known to play an important role in maintaining data quality. We evaluate the quality and scalability of our repair algorithms over synthetic data and present a qualitative case study using a well-known real dataset. The results show that our repair algorithms not only scale well for large datasets, but are able to accurately capture and correct inconsistencies, and accurately decide when a data repair versus a constraint repair is best.

Veracity of Big Data: Challenges of Cross-modal Truth Discovery

In this challenge paper, we argue that the next generation of data management and data sharing systems need to manage not only volume and variety of Big Data but most importantly veracity of data. Designing truth discovery systems requires a fundamental paradigm shift in data management and goes beyond adding new layers of data fusion heuristics or developing yet another probabilistic graphical truth discovery model. Actionable and Web-scale truth discovery requires a transdisciplinary approach to incorporate the dynamic and cross-modal dimension related to multi-layered networks of contents and sources.

The Challenge of Improving Credibility of User-Generated Content in Online Social Networks

In every environment of information exchange, Information Quality (IQ) is considered as one of the most important issues. Studies in Online Social Networks (OSNs) analyze a number of related subjects that span both theoretical and practical aspects, from data quality identification and simple attribute classification to quality assessment models for various social environments. Among several factors that affect information quality in online social networks is the credibility of user-generated content. To address this challenge, some proposed solutions include community-based evaluation and labeling of user-generated content in terms of accuracy, clarity and timeliness, along with well-established real-time data mining techniques.


Publication Years 2009-2016
Publication Count 97
Citation Count 158
Available for Download 97
Downloads (6 weeks) 1258
Downloads (12 Months) 11778
Downloads (cumulative) 65603
Average downloads per article 676
Average citations per article 2
First Name Last Name Award
Ahmed Elmagarmid ACM Distinguished Member (2009)
Beth A. Plale ACM Senior Member (2006)

First Name Last Name Paper Counts
Yang Lee 4
Peter Christen 3
John Talburt 3
Stuart Madnick 3
Wolfgang Lehner 2
Ali Sunyaev 2
Nan Tang 2
Vassilios Verykios 2
Roman Lukyanenko 2
Ross Gayler 2
Eitel LauríA 2
Arnon Rosenthal 2
Dinusha Vatsalan 2
G Shankaranarayanan 2
Mario Mezzanzanica 1
Roberto Boselli 1
Sharad Mehrotra 1
Edward Anderson 1
Dov Biran 1
Shelly Sachdeva 1
Monica Tremblay 1
Stuart Madnick 1
Debra Vandermeer 1
John Krogstie 1
Banda Ramadan 1
Sandra Sampaio 1
Foster Provost 1
Jianyong Wang 1
Wenfei Fan 1
Dustin Lange 1
Roger Blake 1
Therese Williams 1
Chintan Amrit 1
Pierpaolo Vittorini 1
Karthikeyan Ramamurthy 1
Beth Plale 1
Ralf Tönjes 1
Laurent Lecornu 1
Chris Baillie 1
Peter Edwards 1
Douglas Hodson 1
John O’Donoghue 1
Lan Cao 1
Rashid Ansari 1
Arputharaj Kannan 1
Anupkumar Sen 1
Hubert Österle 1
Paolo Coletti 1
Huizhi Liang 1
Suzanne Embury 1
Lizhu Zhou 1
Erhard Rahm 1
Shuai Ma 1
Nigel Martin 1
Jean Caillec 1
Jeffrey Vaughan 1
Payam Barnaghi 1
Melanie Herschel 1
Davide Ceolin 1
Khoi Tran 1
Mirko Cesarini 1
Hongjiang Xu 1
Maurice Van Keulen 1
A Borthick 1
Benjamin Ngugi 1
Beverly Kahn 1
Paul Glowalla 1
Wenyuan Yu 1
Felix Naumann 1
Wenyuan Yu 1
María Bermúdez-Edo 1
Maria Alvarez 1
Dezhao Song 1
Rabia Nuray-Turan 1
Dmitri Kalashnikov 1
Yinle Zhou 1
Carolyn Matheus 1
Mohamed Yakout 1
Ashfaq Khokhar 1
Dmitry Chornyi 1
Danilo Montesi 1
Xiaobai Li 1
Eric Medvet 1
Fabiano Tarlao 1
Omar Alonso 1
Irit Askira Gelman 1
Alexandra Poulovassilis 1
Sara Tonelli 1
Kush Varshney 1
Jimeng Sun 1
Rahul Basole 1
Stephen Chong 1
Edoardo Pignotti 1
Paul Groth 1
Valentina Maccatrozzo 1
John Herbert 1
Juan Augusto 1
Maurice Mulvenna 1
Paul Mccullagh 1
Fabian Panse 1
Fumiko Kobayashi 1
Johann Freytag 1
Fabio Mercorio 1
Richard Briotta 1
Kristin Weber 1
Panagiotis Ipeirotis 1
Paolo Missier 1
Xu Pu 1
Heiko Müller 1
Adir Even 1
Steven Brown 1
Terry Clark 1
H Nehemiah 1
Matthew Jensen 1
Daniel Dalip 1
Pável Calado 1
Tobias Vogel 1
Arvid Heise 1
Uwe Draisbach 1
Fons Wijnhoven 1
Youwei Cheah 1
Wan Fokkink 1
Jeffrey Fisher 1
Adriane Chapman 1
Jeremy Millar 1
Hilko Donker 1
Olivier Curé 1
Claire Collins 1
Eric Nelson 1
Hongwei Zhu 1
Nitin Joglekar 1
Ulf Leser 1
Irit Gelman 1
Paul Bowen 1
Michael Zack 1
Mikhail Atallah 1
Yanjuan Yang 1
Valerie Sessions 1
Trent Rosenbloom 1
Shawn Hardenbrook 1
Subhash Bhalla 1
D Elizabeth 1
Kaushik Dutta 1
M Kaiser 1
Jeffrey Parsons 1
Manoranjan Dash 1
Xiaoming Fan 1
Floris Geerts 1
Thomas Redman 1
David Becker 1
Wenfei Fan 1
Pim Dietz 1
Christan Grant 1
Dennis Wei 1
Aleksandra Mojsilović 1
Sherali Zeadally 1
Ali Khenchaf 1
Ion Todoran 1
Willem Van Hage 1
Len Seligman 1
Gilbert Peterson 1
Robert Ulbricht 1
Martin Hahmann 1
Norbert Ritter 1
Cihan Varol 1
Coşkun Bayrak 1
Craig Fisher 1
David Robb 1
Sufyan Ababneh 1
Peter Elkin 1
C Raj 1
Hema Meda 1
Amitava Bagchi 1
Matteo Magnani 1
Bernd Heinrich 1
Mathias Klier 1
Dirk Ahlers 1
Alberto Bartoli 1
Hongwei Zhu 1
R Greenwood 1
Ayush Singhania 1
George Moustakides 1
Bing Lv 1
Marcos Gonçalves 1
Paul Mangiameli 1
Jianing Wang 1
Daisyzhe Wang 1
Rosella Gennari 1
Mark Braunstein 1
Marta Zarraga-Rodriguez 1
Archana Nottamkandath 1
Darryl Ahner 1
Claudio Hartmann 1
Hongwei Zhu 1
James McNaull 1
Kelly Janssens 1
Jeff Heflin 1
Theodore Speroff 1
Ahmed Elmagarmid 1
Michael Mannino 1
Fiona Rohde 1
Marco Valtorta 1
Elliot Fielstein 1
Yang Lee 1
Judee Burgoon 1
Boris Otto 1
Andrea Lorenzo 1
Richard Wang 1
Maurizio Murgia 1
Josh Attenberg 1
Alun Preece 1
Anja Klein 1
Marco Cristo 1
Marilyn Tremaine 1
Alan March 1
Felix Naumann 1
Kewei Sha 1
Christian Skalka 1

Affiliation Paper Counts
Federal University of Amazonas 1
Qatar Computing Research institute 1
Vanderbilt University 1
Instituto Superior Tecnico 1
Google Inc. 1
University of Leipzig 1
Hospital Universitario Austral 1
Harvard University 1
University of Colorado at Denver 1
Oklahoma City University 1
University of Rhode Island 1
State University of New York at Albany 1
Georgia State University 1
University of Antwerp 1
University of Texas at Austin 1
Beihang University 1
University of Massachusetts System 1
Indian Institute of Science 1
Elsevier 1
University of Kentucky 1
University of Augsburg 1
University of South Carolina 1
Memorial University of Newfoundland 1
Boston University 1
Technical University of Munich 1
Butler University 1
Cardiff University 1
University of Massachusetts Boston 1
Sam Houston State University 1
University College Cork 1
University of Thessaly 1
Microsoft 1
Ben-Gurion University of the Negev 1
Charleston Southern University 1
Commonwealth Scientific and Industrial Research Organization 1
Rutgers University 1
University of Oklahoma 1
University of Patras 1
Hellenic Open University 1
Universite Paris-Est 1
Florida State University 1
Lehigh University 2
Humboldt University of Berlin 2
Nanyang Technological University 2
Old Dominion University 2
Suffolk University 2
Free University of Bozen-Bolzano 2
University of Innsbruck 2
University of Arizona 2
Norwegian University of Science and Technology 2
University of Florida 2
University of Surrey 2
Indiana University 2
New York University 2
Massachusetts Institute of Technology 2
Babson College 2
University of Bologna 2
University of Hamburg 2
Federal University of Minas Gerais 2
University of Queensland 2
University of Aizu 2
Universidad de Navarra 2
University of Massachusetts Lowell 2
Indian Institute of Management Calcutta 2
University of Cologne 3
Telecom Bretagne 3
University of Edinburgh 3
Northeastern University 3
Purdue University 3
University of Aberdeen 3
Georgia Institute of Technology 3
University of Illinois at Chicago 3
University of St. Gallen 3
University of California, Irvine 3
Marist College 3
Birkbeck University of London 3
United States Department of Veterans Affairs 4
Anna University 4
University of Ulster 4
University of Twente 4
United States Air Force Institute of Technology 4
University of Trieste 4
Technical University of Dresden 4
IBM Thomas J. Watson Research Center 4
University of Milan - Bicocca 4
Vrije Universiteit Amsterdam 4
University of Manchester 4
Florida International University 5
MITRE Corporation 5
Tsinghua University 5
University of Arkansas at Little Rock 8
Australian National University 9
All ACM Journals | See Full Journal Index