ACM Journal of

Data and Information Quality (JDIQ)

Latest Articles


Automated Quality Assessment of Metadata across Open Data Portals

The Open Data movement has become a driver for publicly available data on the Web. More and more data—from governments and public institutions... (more)

Towards More Accurate Statistical Profiling of Deployed Microdata

Being promoted by major search engines such as Google, Yahoo!, Bing, and Yandex, Microdata embedded in web pages, especially using,... (more)

Luzzu—A Methodology and Framework for Linked Data Quality Assessment

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this... (more)


Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of the approaches addressing these problems focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that... (more)


Nov. 2016 -- Call for Paper:  Special Issue on Improving the Veracity and Value of Big Data   Submission deadline: Friday March 3, 2017


Jan. 2016 -- New book announcement


Carlo Batini and Monica Scannapieco have a new book:

Data and Information Quality: Dimensions, Principles and Techniques 

Springer Series: Data-Centric Systems and Applications, soon available from the Springer shop

The Springer flyer is available here

Experience and Challenge papers:  JDIQ now accepts two new types of papers. Experience papers describe real-world applications, datasets and other experiences in handling poor quality data. Challenges papers briefly describe a novel problem or challenge for the IQ community. See Author Guidelines for details.

Forthcoming Articles
Reproducibility Challenges in Information Retrieval Evaluation

Information Retrieval is concerned with complex systems, such as search engines or intellectual property search systems, whose performances need to be evaluated in terms of their effectiveness, i.e. their ability to properly rank documents in response to imprecise user queries. Experimental evaluation is central to the progress of the field but several factors impair the reproducibility of the conducted experiments. This paper briefly summarizes the current status of the reproducibility in the IR field, highlights the major challenges, and outlooks some solutions and future directions for an improved and more systematic reproducibility.

The Challenge of Test Data Quality in Data Processing

The need for robust test data sets with test oracles presents challenging questions in data and information quality research. The profound lack of high-quality test data sets to enable the dynamic testing of data processing components highlights open research challenges in data quality related to (1) sample data quality, (2) test data synthesis and (3) quality models.

A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction

The amount of text data has been growing exponentially in recent years. State-of-the-art statistical text extraction methods over this data are likely to contain errors. Recent work has shown probabilistic databases can store and query uncertainty over extraction results, however, these systems do not natively result in a reduction of error. In this paper we propose pi-CASTLE, a system that uses a probabilistic database as an anchor to execute, optimize and integrate machine and human computing. Uncertain fields are crowdsourced with the goal of reducing uncertainty and improving accuracy. We use information theory to optimize the set of questions and a Bayesian probabilistic model to integrate uncertain crowd answers back into the database. Experiments show promising results in significantly reducing machine error using very small amounts of crowdsourced human input. Additionally, probabilistic integration is shown to more effectively resolve conflicting crowd answers and provide users with the flexibility to tune the desired trade-off between accuracy and recall according to the need of applications. Using crowds to assist machine-learned models proves to be a cost-effective way to close the last mile in terms of accuracy for text labeling and extraction tasks.

From Content to Context: The Evolution and Growth of Data Quality Research

Research in data and information quality has made significant strides over the last twenty years. It has become a unified body of knowledge incorporating techniques, methods, and applications from a variety of disciplines including information systems, computer science, operations management, organizational behavior, psychology, and statistics. With organizations viewing Big Data, social media data, data-driven decision-making, and analytics as critical, data quality has never been more important. We believe that data quality research is reaching the threshold of significant growth and a metamorphosis from focusing on measuring and assessing data quality  content - towards a focus on usage and context. At this stage, it is vital to understand the identity of this research area in order to recognize its current state and to effectively identify an increasing number of research opportunities within. Using Latent Semantic Analysis (LSA) to analyze the abstracts of 972 peer-reviewed journal and conference articles published over the past 20 years, this paper contributes by identifying the core topics and themes that define the identity of data quality research. It further explores their trends over time, pointing to the data quality dimensions that have  and have not  been well-studied, and offering insights into topics that may provide significant opportunities in this area

First Name Last Name Paper Counts
Yang Lee 4
Peter Christen 3
John Talburt 3
Stuart Madnick 3
Ross Gayler 2
Dinusha Vatsalan 2
Wolfgang Lehner 2
Ali Sunyaev 2
Nan Tang 2
G Shankaranarayanan 2
Roman Lukyanenko 2
Vassilios Verykios 2
Sherali Zeadally 2
Eitel LauríA 2
Xiaobai Li 2
Arnon Rosenthal 2
Mario Mezzanzanica 1
Roberto Boselli 1
Sören Auer 1
Christoph Lange 1
Luvai Motiwalla 1
Sandra Geisler 1
Daniel Katz 1
Douglas Hodson 1
Sharad Mehrotra 1
Dov Biran 1
Edward Anderson 1
Chris Baillie 1
Peter Edwards 1
Beth Plale 1
Pierpaolo Vittorini 1
Karthikeyan Ramamurthy 1
Ralf Tönjes 1
Laurent Lecornu 1
Shelly Sachdeva 1
Stuart Madnick 1
Monica Tremblay 1
Debra Vandermeer 1
John Krogstie 1
Banda Ramadan 1
Foster Provost 1
Sandra Sampaio 1
Wenfei Fan 1
Therese Williams 1
Chintan Amrit 1
Jianyong Wang 1
Anja Klein 1
Marilyn Tremaine 1
Alan March 1
Marco Cristo 1
Felix Naumann 1
Richard Wang 1
Alun Preece 1
Fiona Rohde 1
Ahmed Elmagarmid 1
Michael Mannino 1
Kewei Sha 1
Elliot Fielstein 1
Theodore Speroff 1
Marco Valtorta 1
Yang Lee 1
Judee Burgoon 1
Boris Otto 1
Andrea Lorenzo 1
Maurizio Murgia 1
Josh Attenberg 1
Dmitry Chornyi 1
Ashfaq Khokhar 1
Danilo Montesi 1
Eric Medvet 1
Fabiano Tarlao 1
Irit Askira Gelman 1
Alexandra Poulovassilis 1
Omar Alonso 1
John Herbert 1
Juan Augusto 1
Maurice Mulvenna 1
Paul Mccullagh 1
Fabio Mercorio 1
Fei Chiang 1
Siddharth Sitaramachandran 1
J Jha 1
Laure Berti-Équille 1
Sven Weber 1
Fabian Panse 1
Fumiko Kobayashi 1
Richard Briotta 1
Johann Freytag 1
María Bermúdez-Edo 1
Maria Alvarez 1
Kristin Weber 1
Panagiotis Ipeirotis 1
Paolo Missier 1
Benjamin Ngugi 1
Beverly Kahn 1
Paul Glowalla 1
Wenyuan Yu 1
Wenyuan Yu 1
Xu Pu 1
Felix Naumann 1
Fausto Giunchiglia 1
Jeremy Debattista 1
Sushovan De 1
Dominique Ritze 1
Heiko Paulheim 1
Christoph Quix 1
Matthias Jarke 1
Wan Fokkink 1
Jeffrey Fisher 1
Adriane Chapman 1
Giannis Haralabopoulos 1
Sebastian Neumaier 1
Kyle Niemeyer 1
Arfon Smith 1
Archana Nottamkandath 1
Darryl Ahner 1
Claudio Hartmann 1
Norbert Ritter 1
Hongwei Zhu 1
Cihan Varol 1
Coşkun Bayrak 1
David Robb 1
Rosella Gennari 1
Daisyzhe Wang 1
Mark Braunstein 1
Marta Zárraga-Rodríguez 1
Craig Fisher 1
Peter Elkin 1
C Raj 1
Sufyan Ababneh 1
Matteo Magnani 1
Hema Meda 1
Amitava Bagchi 1
Bernd Heinrich 1
Mathias Klier 1
Dirk Ahlers 1
Alberto Bartoli 1
R Greenwood 1
Ayush Singhania 1
George Moustakides 1
Marcos Gonçalves 1
Jianing Wang 1
Bing Lv 1
Paul Mangiameli 1
Hongwei Zhu 1
James McNaull 1
Kelly Janssens 1
Judith Gelernter 1
Mouhamadoulamine Ba 1
Ciro D'Urso 1
Subbarao Kambhampati 1
Hua Zheng 1
Jeff Heflin 1
Christian Skalka 1
Roger Blake 1
Dustin Lange 1
John O’Donoghue 1
Axel Polleres 1
Venkata Meduri 1
Wenjun Li 1
Davide Ceolin 1
Khoi Tran 1
Lan Cao 1
Jeffrey Vaughan 1
Melanie Herschel 1
Payam Barnaghi 1
Jean Caillec 1
Arputharaj Kannan 1
Rashid Ansari 1
Hubert Österle 1
Anupkumar Sen 1
Huizhi Liang 1
Paolo Coletti 1
Suzanne Embury 1
Erhard Rahm 1
Shuai Ma 1
Nigel Martin 1
Lizhu Zhou 1
Mirko Cesarini 1
Hongjiang Xu 1
Vincenzo Maltese 1
Jürgen Umbrich 1
Yuheng Hu 1
Yi Chen 1
Robert Meusel 1
Xiaoping Liu 1
Fred Morstatter 1
Valentina Maccatrozzo 1
Paul Groth 1
Maurice Van Keulen 1
Stephen Chong 1
Edoardo Pignotti 1
A Borthick 1
Mohamed Yakout 1
Sara Tonelli 1
Kush Varshney 1
Rahul Basole 1
Jimeng Sun 1
Carolyn Matheus 1
Fons Wijnhoven 1
Tobias Vogel 1
Arvid Heise 1
Uwe Draisbach 1
Olivier Curé 1
Claire Collins 1
Ioannis Anagnostopoulos 1
Patricia Franklin 1
Huan Liu 1
Willem Van Hage 1
Len Seligman 1
Gilbert Peterson 1
Robert Ulbricht 1
Martin Hahmann 1
Peter Aiken 1
Eric Nelson 1
Hongwei Zhu 1
Michael Zack 1
Nitin Joglekar 1
Paul Bowen 1
Mikhail Atallah 1
Yanjuan Yang 1
Ulf Leser 1
Irit Gelman 1
Christan Grant 1
Dennis Wei 1
Aleksandra Mojsilović 1
Ion Todoran 1
Ali Khenchaf 1
Trent Rosenbloom 1
Shawn Hardenbrook 1
Subhash Bhalla 1
D Elizabeth 1
Valerie Sessions 1
Kaushik Dutta 1
M Kaiser 1
Jeffrey Parsons 1
Manoranjan Dash 1
Floris Geerts 1
Thomas Redman 1
David Becker 1
Wenfei Fan 1
Pim Dietz 1
Xiaoming Fan 1
Jeremy Millar 1
Hilko Donker 1
Dezhao Song 1
Rabia Nuray-Turan 1
Dmitri Kalashnikov 1
Yinle Zhou 1
Youwei Cheah 1
Heiko Müller 1
Adir Even 1
Steven Brown 1
Terry Clark 1
H Nehemiah 1
Matthew Jensen 1
Daniel Dalip 1
Pável Calado 1

Affiliation Paper Counts
University of Illinois at Urbana-Champaign 1
Qatar Computing Research institute 1
Florida State University 1
Virginia Commonwealth University 1
Vanderbilt University 1
Instituto Superior Tecnico 1
Google Inc. 1
University of Leipzig 1
Hospital Universitario Austral 1
Harvard University 1
University of Colorado at Denver 1
Oklahoma City University 1
University of Rhode Island 1
State University of New York at Albany 1
Georgia State University 1
University of Antwerp 1
University of Texas at Austin 1
Oregon State University 1
Beihang University 1
University of Massachusetts System 1
Indian Institute of Science 1
Elsevier 1
University of Augsburg 1
University of South Carolina 1
Memorial University of Newfoundland 1
Boston University 1
Technical University of Munich 1
Butler University 1
New Jersey Institute of Technology 1
National Institute of Standards and Technology 1
Cardiff University 1
University of Massachusetts Boston 1
Sam Houston State University 1
University College Cork 1
Microsoft 1
Ben-Gurion University of the Negev 1
Charleston Southern University 1
Commonwealth Scientific and Industrial Research Organization 1
Rutgers, The State University of New Jersey 1
University of Oklahoma 1
University of Patras 1
Hellenic Open University 1
Universite Paris-Est 1
Federal University of Amazonas 1
Lehigh University 2
Humboldt University of Berlin 2
Fraunhofer Institute for Applied Information Technology 2
Nanyang Technological University 2
Old Dominion University 2
Suffolk University 2
Free University of Bozen-Bolzano 2
University of Innsbruck 2
University of Arizona 2
Norwegian University of Science and Technology 2
University of Florida 2
University of Kentucky 2
University of Trento 2
RWTH Aachen University 2
University of Surrey 2
Indiana University 2
New York University 2
Massachusetts Institute of Technology 2
Babson College 2
University of Bologna 2
University of Hamburg 2
Federal University of Minas Gerais 2
University of Queensland 2
University of Aizu 2
McMaster University 2
Universidad de Navarra 2
Indian Institute of Management Calcutta 2
Hamad bin Khalifa University 2
Vienna University of Economics and Business Administration 3
University of Massachusetts Medical School 3
Northeastern University 3
University of St. Gallen 3
University of Edinburgh 3
University of Thessaly 3
Marist College 3
University of Cologne 3
Georgia Institute of Technology 3
University of Aberdeen 3
Telecom Bretagne 3
Purdue University 3
Birkbeck University of London 3
University of Bonn 3
University of California, Irvine 3
University of Mannheim 3
University of Illinois at Chicago 4
IBM Thomas J. Watson Research Center 4
University of Manchester 4
United States Air Force Institute of Technology 4
University of Twente 4
Vrije Universiteit Amsterdam 4
University of Ulster 4
Anna University 4
Technical University of Dresden 4
University of Trieste 4
United States Department of Veterans Affairs 4
University of Milan - Bicocca 4
Tsinghua University 5
MITRE Corporation 5
Florida International University 5
Arizona State University 5
University of Massachusetts Lowell 5
University of Arkansas at Little Rock 8
Australian National University 9
All ACM Journals | See Full Journal Index

Search JDIQ
enter search term and/or author name