Data Quality Archives - SD Times https://sdtimes.com/category/data-quality/ Software Development News Wed, 11 Aug 2021 16:12:45 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://sdtimes.com/wp-content/uploads/2019/06/bnGl7Am3_400x400-50x50.jpeg Data Quality Archives - SD Times https://sdtimes.com/category/data-quality/ 32 32 Report: 75% of developers say they’re responsible for data quality https://sdtimes.com/data/report-75-of-developers-say-theyre-responsible-for-data-quality/ Mon, 09 Aug 2021 13:00:36 +0000 https://sdtimes.com/?p=44974 Nearly three-quarters of developers say they are responsible for managing the quality of the data they use in their applications, a key finding in the 2nd SD Times Data Quality Survey, completed in conjunction with data management provider Melissa in July. In last year’s survey, the number of developers claiming this responsibility was less than … continue reading

The post Report: 75% of developers say they’re responsible for data quality appeared first on SD Times.

]]>
Nearly three-quarters of developers say they are responsible for managing the quality of the data they use in their applications, a key finding in the 2nd SD Times Data Quality Survey, completed in conjunction with data management provider Melissa in July.

In last year’s survey, the number of developers claiming this responsibility was less than 50%, supporting the notion that the role of software developers has expanded beyond writing code. 

As organizations move security, testing, governance and even marketing and finance earlier into the application life cycle, developers are squeezed for time by ever-shrinking delivery timelines, and data quality often remains a “hope it’s right” afterthought to development teams.

Among the other key findings is that the top problem development teams face is inconsistency of the data they need to utilize, followed closely by incomplete data and old/incorrect data. Last year’s top choice, duplicate data, fell to fourth this year. Misfielded data and international character sets round out the category.

Because of these data problems, respondents to the survey said they spend about 10 hours per week dealing with data quality issues, taking time from building new applications.

Despite these problems, some 83% of respondents claimed their organizations are either data proficient or data aware, while only the remainder say they are data savvy (15%) and data driven (around 2%).  

“Data is critical to the success of organizations worldwide, and to find that such a small number consider themselves savvy or data driven is somewhat alarming,” said David Lyman, publisher of SD Times. “With the world moving forward on data privacy and governance, to see organizations still failing to maintain their data should be a wakeup call for the industry at large.”

James Royster, the head of analytics at Adamas Pharmaceutical and formerly the senior director of analytics and data strategy for biopharmaceutical company Celgene, said a big problem organizations face with their data is that there are “thousands of nuances” in big sets of data.

Royster gave an example of IQVIA, a health care data connectivity solutions provider, which collects data from more than 60,000 pharmacies, each dispensing hundreds and thousands of drugs, serums and more. On top of that, they service hospitals and doctors’ offices.  So, he explained, “there are millions of potential points of error.”  And in order for companies to create these datasets, they have to have developers write code that brings these data sets together, in a way that can be digested by a company. And that’s an ongoing process. “So as they’re changing code, updating code, collecting data, whatever it is, there’s millions of opportunities for things to go wrong.”

But data issues don’t occur only in large organizations. Smaller companies also have problems with data, as they don’t have the resources to properly collect the data they need and monitor it for changes beyond someone in the database contacting them that something in their data has changed.

As an example, smaller companies might use a form to collect data for users, but many users provide bad data to avoid unwanted contact. The problem, Royster said, is that “there’s nobody checking it or aggregating it or applying any sort of logic to it to say, this is how this should be. It’s just data goes in … data comes out. And if that data that goes in is incorrect, what comes out is incorrect.”

 

The post Report: 75% of developers say they’re responsible for data quality appeared first on SD Times.

]]>
Data quality: It’s a matter of trust https://sdtimes.com/data/data-quality-its-a-matter-of-trust/ Thu, 12 Nov 2020 17:32:47 +0000 https://sdtimes.com/?p=42075 Businesses rely on data to make decisions that drive their bottom line. But if they can’t trust the data, or the analysis of the data, they lose the ability to move with more certainty that what they’re doing is correct. Data quality has many different inputs and dimensions. IDC research director Steward Bond said that … continue reading

The post Data quality: It’s a matter of trust appeared first on SD Times.

]]>
Businesses rely on data to make decisions that drive their bottom line. But if they can’t trust the data, or the analysis of the data, they lose the ability to move with more certainty that what they’re doing is correct.

Data quality has many different inputs and dimensions. IDC research director Steward Bond said that among them are data accuracy, duplication, consistency, correctness and context. And the level of data quality that is available within an organization is going to change. Further, working with internal data is different than working with external data you receive as inputs. “So,” Bond said, “I don’t if there’s a really good answer” to the breadth and depth of the data quality problem.

RELATED CONTENT:
The SD Times Data Quality Project
Data is key to returns on investment
The first step to ensuring data quality is validation

He went on to say that if even the data quality level is good, many of the data analytics tools inherently have some sort of human bias, because they’re going to be skewed by what the data teams in an organization want to get out of the data or expect to get out of the data.

“We’ve heard stories about two people showing up at an executive meeting with two different results coming in from supposedly the same set of data,” Bond said. “That can really erode the trust that is in the data, and the analytics of the data.”

Storing data in the cloud also presents challenges when it comes to trust, Bond explained, because every SaaS application has to have its own copy of the people, places and things the organization most cares about. “I liken this back to the game some people call the telephone game. It’s when a group of people sit in a circle, or they’re standing in line, the first person whispers a phrase to the person next to them. When you get to the end, what happens is the story changes, or the phrase changes, and so you have that same potential issue with every single copy of data that’s created. And so that comes into that data quality calculation and estimation as well.

At the beginning of the SD Times Data Quality Project, I described issues of data quality as being the industry’s “dirty little secret.” But organizations such as IDC have been able to pull the curtain back on this, and a recent survey of 300 people who do data analytics with business intelligence and dashboarding tools showed that only 10% of the respondents said the quality of their data — or their trust in that data — was not a challenge at all, Bond said. “This means that 90% have some level of concern in trusting the quality of their data.” 

While Bond said he doesn’t think the industry will have pure, pristine, 100% clean data, he did say if organizations know the level of their data quality — the data quality score — they can take that score and bring it into their algorithms as a form of a statistical level of competence. Those who have, he noted, have found “a tremendous improvement in the success of how they’re analyzing that data. So then that gives some guidance as to how and where you can use the results of those analytics in your decision-making.”

The post Data quality: It’s a matter of trust appeared first on SD Times.

]]>
Data is key to returns on investment https://sdtimes.com/data/data-is-key-to-returns-on-investment/ Fri, 30 Oct 2020 19:30:16 +0000 https://sdtimes.com/?p=41909 Return Policy Guide is a website that aggregates the policies of many major retailers, so if a customer is unhappy with his purchase, he can learn how to most effectively return the product and get his money back. The site handles a ton of data, from the individual policies themselves, to ads on the site, … continue reading

The post Data is key to returns on investment appeared first on SD Times.

]]>
Return Policy Guide is a website that aggregates the policies of many major retailers, so if a customer is unhappy with his purchase, he can learn how to most effectively return the product and get his money back.

The site handles a ton of data, from the individual policies themselves, to ads on the site, to user reviews and more. Ashutosh Panda, senior developer at the company, explained that his developers do not get a say about the data creation or input, but are responsible for the data they choose to use in their applications.

RELATED CONTENT: 
The SD Times Data Quality Project
The first step to ensuring data quality is validation
Data quality: It’s a matter of trust

“We just give the developer a set of problems and then the solution we want from him,” Panda said. “That’s it. The process that he uses is absolutely upon him. So he is the one who makes the call as to which data to include, which data not to include, to get the best results. And so I would say that, yes, the developer is essential in this process, but equally responsible are the  customers we are getting the reviews from or the data from. So yes, developer is important, but we cannot say the developer is the person responsible for the data that we get.”

While the accuracy of data is extremely important, Panda said the biggest issue his organization is with data authenticity. “I will give you one example. If you have bought one item, and I asked you for the review, you have experienced it, you have used it firsthand. And so you give me a review, which is really authentic, which can be trusted by other customers.”

Yet sometimes, a person sees the reviews and creates a review of his own from that — good, bad or otherwise — in an attempt to manipulate others to either make a purchase or to shop somewhere else. The review might be accurate based on the others it was influenced, but it is not authentic, because that reviewer has not used the product. 

Panda said Return Policy Guide has a methodology to determine who is authentic responder that includes a series of questions about a particular profession or age range. “So before coming to our original set of questions, we take them through a series of three to five questions, and their answers define the authenticity of the data that we get on our next set of questions, the original set of questions that we were going to pose to them.”

The amount of time developers need to spend to ensure the data they use is of good quality depends on the quality of the data set they are provided, Panda said, as well as the question that is pushed. So time can be defined as working time, or the number of hours given to a single data set. If they’re asking a broad question, they’ll use all the data that comes in. But if there is something very specific to be found out, he said, “Before we take the data, we have to have like a week of data cleaning, data managing and data validation. Then we need to sit for at least two, three hours each day for a couple of days to choose the correct type of questions and to decide what kind of people — range of age, or profession — the question should be posed to. So even before receiving the data set, developers start working on data so when a person comes to us, we’re making sure that the data set we receive back is a good data set to start our work with. If it’s very dirty,then the developer has to sit for like eight hours a day, for a week or two weeks, to get it right and then put it into the model for the best results.”

The post Data is key to returns on investment appeared first on SD Times.

]]>
The first step to ensuring data quality is validation https://sdtimes.com/data/the-first-step-to-ensuring-data-quality-is-validation/ Thu, 22 Oct 2020 13:36:26 +0000 https://sdtimes.com/?p=41790 Blue Compass is an Iowa-based digital marketing agency specializing in website development and SEO. As such, according to development manager David Wanat, they take care of “everything beyond the design of the site” on the back end. Not only that, but Wanat also said he is responsible for ensuring the data is good, whether it’s … continue reading

The post The first step to ensuring data quality is validation appeared first on SD Times.

]]>
Blue Compass is an Iowa-based digital marketing agency specializing in website development and SEO. As such, according to development manager David Wanat, they take care of “everything beyond the design of the site” on the back end.

Not only that, but Wanat also said he is responsible for ensuring the data is good, whether it’s internal data or coming in from another source. So for him, the first step toward data quality is validation.

“We’ve got articles and blog posts on our site, we have RSS feeds, we just finished an airport website, so there’s parking information, like how many spots are in a lot, or is this flight on time, or is it delayed? Some of it is user inputted through WSYWIG engine or through an API,” he explained. “We’re talking to another site that gives us information, like REST calls, or maybe a CSV file is uploaded via FTP, and we dig through that to find information. There’s all kinds of different sources for this data. And some is end-user driven, where they’ll put in information requests via a web page.”

RELATED CONTENT:
The SD Times Data Quality Project
Data is key to returns on investment
Data quality: It’s a matter of trust

One way Blue Compass ensures good data is being input into their forms is by limiting the amount of free-form data users have to type in. Wanat explained the company first has to think ahead about what they intend to do with the data, and minimize user input to must-haves, like inputting your name. “But if I can use a calendar date picker to put a date in instead of you free-forming the date, that’d be way better in my world, because I can control the format from the date picker,” he said. “If you’re picking a preference — a size of a shirt, a color — I’m going to control that as much as possible so I get the color red instead of burnt umber, so I know exactly which one they’re picking.”

But there are cases where the data input could be of good quality but something still is wrong. 

“If you’re asking people a question, and 50 percent of them respond with almost the same exact answer they typed in, that doesn’t seem like it’s very unique,” Wanat pointed out. “If you’re asking people what they had for lunch, and everybody says a ham sandwich or pizza, instead of like… you would expect it to be a very wide difference. So if I see the exact same answer, that tells me something’s off here. You have to figure out what you’re expecting to get, and when you get something that seems off, it probably is. ” 

Yet in spite of these controls, bad data still is unavoidable. When that happens, Wanat turns to the user of data validators. He explained the company will do some quick tests internally on the data, and depending upon what they find, they might use machine learning to understand why the bad data is getting through. 

Wanat said they also check the length of the input, to see if it aligns with what they are expecting. “If somebody’s typing in an address, it shouldn’t be very long,” he said. “If it’s over 200 characters long, that’s a problem.” Further, he said, they will scan data for some quick text validation, looking for script tabs or special characters that should not be in there. If found, he said they will either “code that out, or invalidate it altogether and send [the user] back to the info form.” 

Those kinds of checks happen before the data gets into the database. But if something through those checks, they will again validate that input before bringing the information back out of the database. 

As you would expect, this can take up quite a bit of a developer’s time. In a survey of developers on data quality issues SD Times completed in August, respondents indicated that spend about one day per work week on data quality issues. Wanat agreed with that sentiment.

“You can write a web page or a web form that takes input in a few minutes,” he said. “But if I have to add validators for this, when I have to scan for that, if I had to code it, sort it in the database, now I’ve quadrupled the amount of time it’s taking me to do this one thing.

“It’s just part of what we’re doing, and ensures our clients are getting what they want,” he continued. “No one wants to say, ‘Oh we had a script injection and all the data was erased from the database.’ “

If Blue Compass’ clients can pay once to have good data coming in, then they save that time continuously after that because they’re getting a higher quality product, Wanat explained.

The post The first step to ensuring data quality is validation appeared first on SD Times.

]]>
The SD Times Data Quality Project https://sdtimes.com/data/the-sd-times-data-quality-project/ Mon, 19 Oct 2020 20:25:51 +0000 https://sdtimes.com/?p=41757 It began last January with a column I wrote titled “The little dirty data secret,” which shone a light on the issue of data quality — or lack thereof — within many organizations. I called it the dirty little secret because of a reluctance on the part of many to even acknowledge they had a … continue reading

The post The SD Times Data Quality Project appeared first on SD Times.

]]>
It began last January with a column I wrote titled “The little dirty data secret,” which shone a light on the issue of data quality — or lack thereof — within many organizations. I called it the dirty little secret because of a reluctance on the part of many to even acknowledge they had a problem.

I spoke with Greg Brown at data quality solution provider Melissa about it, who said for many organizations, poor data quality is “the cost of doing business.” We spoke about the issue in their California offices, and from that conversation was born the SD Times Data Quality Project.

After our initial meeting, Brown worked with us to create a survey of developers as to how much they were responsible for data quality, what their role was in ensuring the data going into their applications was of high quality. In short, more than half of the 202 respondents said they were involved in data quality input, data quality management, choosing validation APIs or API data quality solutions, and data integration. 

RELATED CONTENT:
Survey: Developers claim responsibility for managing data quality
The first step to ensuring data quality is validation
Data is key to returns on investment
Data quality: It’s a matter of trust

To help us better understand the issue, Brown described six defined dimensions of data quality, The standard measures of data are accuracy, timeliness, consistency, validity, uniqueness and completeness. 

We asked people who took the survey if they would share their stories, and several agreed. They will be appearing on sdtimes.com in the coming weeks. We’ll hear where their data problems exist, what they’re doing to remediate those problems, and where they are at now. Among the issues we’ll be talking about are data integrity, poor documentation, a lack of training for dealing with data, cleaning and optimization, and data management.

In a world where the amount of data organizations are handling has literally exploded, maintaining quality takes more time and effort than ever before. We’re looking forward to bringing you the stories from the field of how organizations today are managing. Join us for the journey.

The post The SD Times Data Quality Project appeared first on SD Times.

]]>