FIT MGT5114 – Wk4 Discussion 1 Peer Response

Good post and you are certainly in the majority with your perspective regarding the existence of duplicate records in a database and the negative impact on DB integrity. My only issue with this question and the responses is the idea that duplicate “DB” records is primarily explored in the context of RDBMS. Professor Karadsheh mentions Big Data in a few response posts, Big Data and the emergence of NoSQL and Document Databases have challenged some of the concepts firmly rooted in legacy RDBMS best practices where relationships and table joins are foundational, and duplicate data typically presents a significant problem. At a high-level SQL database rely on structured data, tables with fields, normalized data inserted into these fields, relationships between tables and SQL statements to return results. It’s easy to see the pitfalls of a duplication in the context of an RDBMS. NoSQL or Document Databases use a key-value store paradigm, where keys and values are defined when unstructured, denormalized data is ingested. A good example of this is opening a stream from the Twitter API for something like sentiment analysis. I use this as an example because I am a heavy user of ElasticSearch (a NoSQL DB) for log and sentiment analysis. The benefit of NoSQL is the ability to ingest thousands of unstructured, denormalized records per second; these unstructured, denormalized records use key-value pairs to map the keys to data (value).

Here is an example use of ElasticSearch: A data stream is open using the Twitter API, the data stream is pushed into ElasticSearch and then Kibana is used to visualize sentiment. In this case, duplicate records don’t indicate that the that the integrity of the database is suspect, time series don’t matter, etc… What is important is the ability to stream of messages per seconds, use an NLP library to determine sentiment, create a JSON record containing key/value pairs and add to ElasticSearch.

ElasticSearch records look like this: http://www.awesomescreenshot.com/image/2357496/22cb647c962eb32ee38e8ad8ee3c13d5
POTUS Sentiment Analysis using ElasticSearch and Kibana: http://gotitsolutions.org/2017/02/24/potus-sentiment-analysis/

Like so many things I think the answer to this question in a context which defines DB as more than just RDBMS is, it depends. With that said I do agree that duplication in the context of traditional RDBMS can wreak havoc on data integrity.

References

Bocchinfuso, R. J. (2017, March 31). POTUS Sentiment Analysis. Retrieved April 02, 2017, from http://gotitsolutions.org/2017/02/24/potus-sentiment-analysis/

Issac, L. P. (2014, January 14). SQL vs NoSQL Database Differences Explained with few Example DB. Retrieved April 02, 2017, from http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/?utm_source=tuicool

Pfleeger, C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in computing (5th ed.). Upper Saddle River: Prentice Hall.

Richard J. Bocchinfuso

FIT MGT5114 – Wk4 Discussion 1 Peer Response

Leave a Reply Cancel reply