ACM Journal of

Data and Information Quality (JDIQ)

Latest Articles

Information Quality Research Challenge

As information technology becomes an integral part of daily life, increasingly, people understand the world around them by turning to digital sources as opposed to directly interacting with objects in the physical world. This has ushered in the age of Ubiquitous Digital Intermediation (UDI). With the explosion of UDI, the scope of Information... (more)

Combining User Reputation and Provenance Analysis for Trust Assessment

Trust is a broad concept that in many systems is often reduced to user reputation alone. However, user reputation is just one way to determine trust.... (more)


Jan. 2016 -- New book announcement


Carlo Batini and Monica Scannapieco have a new book:

Data and Information Quality: Dimensions, Principles and Techniques  

Springer Series: Data-Centric Systems and Applications, soon available from the Springer shop

The Springer flyer is available here

Special issue on Web Data Quality

The goal of this special issue is to present innovative research in the areas of Web Data Quality Assessment and Web Data Cleansing. The editors of this special issue are Christian Bizer, Xin Luna Dong, Ihab Ilyas, and Maria-Esther Vidal. See the call for papers for more details.



New options for ACM authors to manage rights and permissions for their work

ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights webpage.


ICIQ 2015, the International Conference on Information Quality, will take place on July 24 in Cambrigde, MA at the MIT.

Experience and Challenge papers: JDIQ now accepts two new types of papers. Experience papers describe real-world applications, datasets and other experiences in handling poor quality data. Challenges papers briefly describe a novel problem or challenge for the IQ community. See calls for papers for details.

Special Issue on Provenance and Quality of Data and Information: The term provenance refers broadly to information about the origin, context, derivation, lineage, ownership or history of some artifact. The provenance of data is more specifically a form of structured metadata that records the activities involved in data production. The notion applies to a broad variety of data types, from database records, to scientific datasets, business transaction logs, web pages, social media messages, and more. At the same time, different definitions and measures of quality apply to each of these data types, in different domains.

The JDIQ guest editors are Paolo Missier (Newcastle University, UK, and Paolo Papotti (Qatar Computing Research Institut, Qatar,

Forthcoming Articles

The Challenge of Quick and Dirty Information Quality

We present a new research challenge in quick and dirty information quality (IQ) to quickly assess sources quality. We also describe its real-world importance, and suggest research directions.

Data Quality Challenges in Distributed LVC Test Environments

Distributed live-virtual-constructive (LVC) simulation promises a number of benefits for the test and evaluation (T&E) community, including reduced costs, access to simulations of limited availability assets, the ability to conduct large-scale multi-service test events, and recapitalization of existing simulation investments. As fully replicated, geographically distributed database applications designed to support interaction with live participants and real hardware, LVC simulations face a number of real-time constraints and engineering trade-offs. For instance, data must be replicated at each node to meet availability and responsiveness requirements. However, replication yields inconsistencies in entity and world state data and induces uncertainties in derived quantities such as weapons effectiveness. Assessing the impact of state inconsistency and quantifying the resulting measurement uncertainty are key challenges for T&E programs relying on distributed LVC simulation.

Automatic Discovery of Abnormal Values in Large Textual Databases

Textual databases are ubiquitous in many application domains. With online services, individuals are increasingly required to enter their personal details for example when purchasing products online or registering for government services, while many social network and e-Commerce sites allow users to post short comments. Many online sites leave open the possibility for people to enter unintended or malicious abnormal values, such as names with errors, bogus values, profane comments, or random character sequences. In other applications, such as online bibliographic databases or comparative online shopping sites, databases are increasingly populated in (semi-) automatic ways through web crawls. This practice can result in low quality data being added automatically into a database. In this paper we develop three techniques to automatically discover abnormal values in large textual databases. Following recent work in categorical outlier detection, our assumption is that `normal' values are those that occur frequently in a database, while an individual abnormal value is rare. Our techniques are unsupervised and address the challenge of discovering abnormal values as an outlier detection problem. Our first technique is a basic q-gram set based technique, the second is based on a probabilistic language model, and the third employs morphological word features to train a one-class support vector machine classifier. Our aim is to investigate and develop techniques that are fast, efficient, and automatic. The output of our techniques can help in the development of rule-based data cleaning and information extraction systems, or be used as training data for further supervised data cleaning procedures.

Unifying Data and Constraint Repairs

Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data, by finding minimal or lowest cost changes to the data that make it consistent with the constraints. Such techniques are appropriate for the old world where data changes, but schemas and their constraints remain fixed. In many modern applications however, constraints may evolve over time as application or business rules change, as data is integrated with new data sources, or as the underlying semantics of the data evolves. In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired), or if the constraints have evolved (and the constraints should be repaired). In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing. We consider repairs over a database that is inconsistent with respect to a set of rules, modeled as functional dependencies (FDs). FDs are the most common type of constraint, and are known to play an important role in maintaining data quality. We evaluate the quality and scalability of our repair algorithms over synthetic data and present a qualitative case study using a well-known real dataset. The results show that our repair algorithms not only scale well for large datasets, but are able to accurately capture and correct inconsistencies, and accurately decide when a data repair versus a constraint repair is best.

Combining User Reputation and Provenance Analysis for Trust Assessment

Data and Analytics Challenges for a Learning Healthcare System

Digital health data is both big and wide. We discuss three distinct challenges in applying data analytics toward the development of a learning healthcare system: data access, data curation, and development of new analytic techniques. We conclude with some interim approaches and future opportunities.


Publication Years 2009-2016
Publication Count 91
Citation Count 139
Available for Download 91
Downloads (6 weeks) 1211
Downloads (12 Months) 10819
Downloads (cumulative) 62279
Average downloads per article 684
Average citations per article 2
First Name Last Name Award
Mikhail Atallah ACM Fellows (2006)
Ahmed Elmagarmid ACM Fellows (2012)
ACM Distinguished Member (2009)
Wenfei Fan ACM Fellows (2012)
Wenfei Fan ACM Fellows (2012)
Beth A. Plale ACM Senior Member (2006)

First Name Last Name Paper Counts
Stuart Madnick 3
John Talburt 3
Yang Lee 3
Ali Sunyaev 2
Vassilios Verykios 2
Eitel Lauría 2
Nan Tang 2
Peter Christen 2
G Shankaranarayanan 2
Chris Baillie 1
Peter Edwards 1
Beth Plale 1
Banda Ramadan 1
John Krogstie 1
Wenfei Fan 1
Dustin Lange 1
Therese Williams 1
Mario Mezzanzanica 1
Roberto Boselli 1
Pierpaolo Vittorini 1
Karthikeyan Ramamurthy 1
Laurent Lecornu 1
Ralf Tönjes 1
Edward Anderson 1
Chintan Amrit 1
Sharad Mehrotra 1
Dov Biran 1
Sandra Sampaio 1
Roger Blake 1
Jianyong Wang 1
Shelly Sachdeva 1
Stuart Madnick 1
Monica Tremblay 1
Debra Vandermeer 1
Foster Provost 1
Roman Lukyanenko 1
Jeffrey Vaughan 1
Melanie Herschel 1
Paolo Coletti 1
Huizhi Liang 1
Erhard Rahm 1
Jean Caillec 1
Payam Barnaghi 1
John O’Donoghue 1
Shuai Ma 1
Nigel Martin 1
Lan Cao 1
Suzanne Embury 1
Arputharaj Kannan 1
Lizhu Zhou 1
Rashid Ansari 1
Hubert Österle 1
Anupkumar Sen 1
Stephen Chong 1
Edoardo Pignotti 1
Fabiano Tarlao 1
Eric Medvet 1
Hongjiang Xu 1
Sara Tonelli 1
Kush Varshney 1
Mirko Cesarini 1
Rahul Basole 1
Jimeng Sun 1
Alexandra Poulovassilis 1
Maurice Van Keulen 1
A Borthick 1
Mohamed Yakout 1
Carolyn Matheus 1
Ashfaq Khokhar 1
Irit Askira Gelman 1
Dmitry Chornyi 1
Danilo Montesi 1
Omar Alonso 1
Xiaobai Li 1
Paul Glowalla 1
Wenyuan Yu 1
Fabio Mercorio 1
María Bermúdez-Edo 1
Maria Alvarez 1
Felix Naumann 1
John Herbert 1
Wenyuan Yu 1
Fabian Panse 1
Fumiko Kobayashi 1
Johann Freytag 1
Paolo Missier 1
Juan Augusto 1
Maurice Mulvenna 1
Paul Mccullagh 1
Richard Briotta 1
Xu Pu 1
Benjamin Ngugi 1
Beverly Kahn 1
Kristin Weber 1
Panagiotis Ipeirotis 1
Youwei Cheah 1
Tobias Vogel 1
Arvid Heise 1
Uwe Draisbach 1
Fons Wijnhoven 1
Dezhao Song 1
Yinle Zhou 1
Heiko Müller 1
Rabia Nuray-Turan 1
Dmitri Kalashnikov 1
H Nehemiah 1
Terry Clark 1
Adir Even 1
Steven Brown 1
Matthew Jensen 1
Daniel Dalip 1
Pável Calado 1
Floris Geerts 1
David Becker 1
Thomas Redman 1
Christan Grant 1
Dennis Wei 1
Aleksandra Mojsilović 1
Sherali Zeadally 1
Ion Todoran 1
Ali Khenchaf 1
Eric Nelson 1
Hongwei Zhu 1
Nitin Joglekar 1
Ulf Leser 1
Irit Gelman 1
Paul Bowen 1
Manoranjan Dash 1
Claire Collins 1
Olivier Curé 1
Wenfei Fan 1
Pim Dietz 1
Michael Zack 1
Mikhail Atallah 1
Yanjuan Yang 1
D Elizabeth 1
Xiaoming Fan 1
Valerie Sessions 1
Trent Rosenbloom 1
Shawn Hardenbrook 1
Subhash Bhalla 1
Kaushik Dutta 1
M Kaiser 1
Jeffrey Parsons 1
Dirk Ahlers 1
Alberto Bartoli 1
Ross Gayler 1
Dinusha Vatsalan 1
Rosella Gennari 1
Daisyzhe Wang 1
Mark Braunstein 1
Marta Zarraga-Rodriguez 1
Jianing Wang 1
Norbert Ritter 1
Cihan Varol 1
Coşkun Bayrak 1
David Robb 1
R Greenwood 1
Ayush Singhania 1
Wolfgang Lehner 1
C Raj 1
George Moustakides 1
Paul Mangiameli 1
Craig Fisher 1
Bing Lv 1
Sufyan Ababneh 1
Peter Elkin 1
Hema Meda 1
Amitava Bagchi 1
Matteo Magnani 1
Marcos Gonçalves 1
Bernd Heinrich 1
Mathias Klier 1
Hongwei Zhu 1
Christian Skalka 1
Maurizio Murgia 1
Andrea Lorenzo 1
Felix Naumann 1
Kewei Sha 1
Jeff Heflin 1
Alun Preece 1
Anja Klein 1
Kelly Janssens 1
James McNaull 1
Ahmed Elmagarmid 1
Michael Mannino 1
Fiona Rohde 1
Marilyn Tremaine 1
Marco Valtorta 1
Elliot Fielstein 1
Theodore Speroff 1
Yang Lee 1
Alan March 1
Judee Burgoon 1
Marco Cristo 1
Boris Otto 1
Josh Attenberg 1
Richard Wang 1

Affiliation Paper Counts
Federal University of Amazonas 1
Qatar Computing Research institute 1
Vanderbilt University 1
Instituto Superior Tecnico 1
Google Inc. 1
University of Leipzig 1
Hospital Universitario Austral 1
Harvard University 1
University of Colorado at Denver 1
Oklahoma City University 1
University of Rhode Island 1
State University of New York at Albany 1
Georgia State University 1
MITRE Corporation 1
University of Antwerp 1
University of Texas at Austin 1
Beihang University 1
University of Massachusetts System 1
Indian Institute of Science 1
University of Kentucky 1
University of Augsburg 1
University of South Carolina 1
Technical University of Dresden 1
Memorial University of Newfoundland 1
Boston University 1
Technical University of Munich 1
Butler University 1
Cardiff University 1
University of Massachusetts Boston 1
Sam Houston State University 1
University College Cork 1
University of Thessaly 1
Microsoft 1
Ben-Gurion University of the Negev 1
Charleston Southern University 1
Commonwealth Scientific and Industrial Research Organization 1
Rutgers University 1
University of Oklahoma 1
University of Patras 1
University of Massachusetts Lowell 1
Hellenic Open University 1
Universite Paris-Est 1
Florida State University 1
Lehigh University 2
Humboldt University of Berlin 2
Nanyang Technological University 2
Old Dominion University 2
Suffolk University 2
Free University of Bozen-Bolzano 2
University of Innsbruck 2
University of Arizona 2
Norwegian University of Science and Technology 2
University of Florida 2
University of Surrey 2
Indiana University 2
New York University 2
Massachusetts Institute of Technology 2
Babson College 2
University of Bologna 2
University of Hamburg 2
Northeastern University 2
Federal University of Minas Gerais 2
University of Queensland 2
University of Aizu 2
Universidad de Navarra 2
Indian Institute of Management Calcutta 2
Marist College 3
University of Cologne 3
Telecom Bretagne 3
University of Illinois at Chicago 3
Purdue University 3
University of California, Irvine 3
Georgia Institute of Technology 3
University of Edinburgh 3
University of St. Gallen 3
University of Aberdeen 3
Birkbeck University of London 3
United States Department of Veterans Affairs 4
Anna University 4
University of Ulster 4
University of Twente 4
University of Trieste 4
IBM Thomas J. Watson Research Center 4
Florida International University 4
University of Milan - Bicocca 4
University of Manchester 4
Australian National University 5
Tsinghua University 5
University of Arkansas at Little Rock 8
All ACM Journals | See Full Journal Index