How does data quality impact business performance? Using your textbook as a resource, describe the functions of database technology, the differences between centralized and distributed database architecture, how data quality impacts performance, and the role of a master reference file in creating accurate and consistent data across the enterprise.

When poor data quality such as missing and/or erroneous data negatively impacts operations, it can cost organizations business, affecting revenues and profits.  Missing and/or erroneous data can affect current revenues and can frustrate customer and place an organizations reputation at risk putting both existing and future business at stake. Data quality issues can decrease efficiency and increase costs, lack of confidence in data integrity causes organizations to spend time and money on data validation and error correction activities.

The goal of a database is to store data in a structured way (maybe).  Two popular database architectures are SQL and NoSQL databases.  SQL or RDBMS (Relational Database Management Systems) are relational with a defined schema. NoSQL databases or document databases are often schemaless and rely on key-value pairs defined at ingest.  As you can imagine ingesting (or inserting) data into a specified schema makes managing data integrity easier than defining the key-value pairs at the time of ingest.

Centralized and distributed database architectures are quite intuitive.  Centralized database architectures centralize the storage and control of data while distributed database architectures allow data to be stored on edge devices such as laptops, tablets, and mobile devices or distributed using master/master, master/slave or parent/child relationships.  Centralized database architectures offer greater control of data quality and security because all data is stored in a single physical location thus adds, updates and deletes can be made in a supervised and orderly fashion.  Centralized database architectures also allow for better security. It is easier to control physical and logical access to a centralized architecture, and the attack surface is limited when contrasted with a distributed system.

Centralized and distributed database architectures each come with tradeoffs which should be considered when selecting an appropriate architecture.  With more and more processing being pushed to the edge (e.g. – mobile and IoT growth) and with ever increasing big data demands decentralized distributed databases like Apache Cassandra and RethinkDB are experiencing massive growth.  Centralized databases like Microsoft SQL Server, MariaDB and others are still very prominent, but even these centralized database players are trying to adapt their architectures to support distributed database architectures to capitalize on the big data revolution.

Master reference files provide a common point of reference and act as a single source truth for a given data entity. Data entities might include customer, product, supplier, employee or asset data. As a single source of truth, master reference files are used to feed data into enterprise systems and maintain data quality and integrity.

References

Buckler, Craig. “SQL vs NoSQL: The Differences — SitePoint.” SitePoint, SitePoint, 18 Sept. 2015, www.sitepoint.com/sql-vs-nosql-differences/. Accessed 14 Sept. 2017.

“Do You Know How Data Quality Impacts Your Business?” BackOffice Assicates, 23 July 23ADAD, resources.boaweb.com/backoffice-blog/do-you-know-how-data-quality-impacts-your-business. Accessed 14 Sept. 2017.

Turban, Efraim, et al. Information technology for management digital strategies for insight, action, and sustainable performance. New Jersey (Estados Unidos), Wiley, 2015.