MSIT Cybersecurity – Page 10 – Richard J. Bocchinfuso

FIT – MGT 5000 – Week 2

Discussion Post

1. Which was larger for Under Armour, Inc. during 2014: (1) sales revenue, or (2) cash collected from customers? Why? Show computation. (Challenge)

2014 Sales revenue = $3,084,370 > Cash collected from customers in 2014 = $3,014,487

2. Investors are vitally interested in a company’s sales and profits and its trends of sales and profits over time. Consider Under Armour, Inc.’s sales and net income (net loss) during the period from 2012 through 2014. Compute the percentage increase or decrease in net sales and also in net income (net loss) from 2012 to 2014. Which item grew faster during this two-year period—net sales or net income (net loss)? Can you offer a possible explanation for these changes? (Challenge)

Net sales (revenue) grew faster than net income.
A potential explanation for income growth not keeping pace with or outgrowing revenue growth is expense increases which grew at a greater rate than revenue.

Note: Detailed calculations and explanations contained in the attached spreadsheet.

Detailed Calculations

[google-drive-embed url=”https://drive.google.com/file/d/0B1fr2Qqx-moWZDl4RjVzcFBlcTg/preview?usp=drivesdk” title=”Bocchinfuso_FIT-MGT5000_Week2-Chapter2_FOA_UA.xlsx” icon=”https://drive-thirdparty.googleusercontent.com/16/type/application/vnd.openxmlformats-officedocument.spreadsheetml.sheet” width=”100%” height=”400″ style=”embed”]

Assignments

[google-drive-embed url=”https://drive.google.com/file/d/0B1fr2Qqx-moWWjV0NUJtclNPWTQ/preview?usp=drivesdk” title=”Bocchinfuso_FIT-MGT5000_Week2-Chapter2-Problems.xlsx” icon=”https://drive-thirdparty.googleusercontent.com/16/type/application/vnd.openxmlformats-officedocument.spreadsheetml.sheet” width=”100%” height=”400″ style=”embed”]

Published by rbocchinfuso on May 30, 2017 | Permalink

FIT – MGT 5000 – Week 1

Discussion Post

1. Go on the Internet and do some research on Under Armour, Inc., and its industry. Use one or more popular websites like http://finance.yahoo.com or http://www.google.com/finance. Write a paragraph (about 100 words) that describes the industry, some current developments, and a projection for where the industry is headed.

Founded in 1996 by former University of Maryland football player Kevin Plank, Under Armour claims to be the originator of performance apparel. Under Armour differentiates their apparel by stating it keeps athletes cool, dry and light throughout the course of a game, practice or workout. Under Armour has experienced excellent market traction and has risen to be one of the top performance apparel providers in the word. Under Armour is the official footwear supplier of the NFL and MLB and partners with the NBA. Under Armour heavily relies on athlete endorsements from top performers in football, basketball, soccer, and baseball. Under Armour’s products are made from its moisture-wicking and heat-dispersing fabrics which keep athletes dry and relatively comfortable. Under Armour is entering the technology space and leveraging their foothold in the apparel market to enter the wearable market to allow customers track their fitness. Under Armour sells online, by catalog, through its own retail and outlet stores, and in more than 25,000 retail stores worldwide.

Under Armour is in the apparel manufacturing company that develops, manufactures, markets and sell its own products. Under Armour is an apparel maker and retailer and they are impacted by trends in both the apparel and retail sectors. As Under Armour enters the tech sector they will be impacted by technology adoption and obsolescence, this will be interesting to watch.

Under Armour is being impacted by a struggling retail sector. After years of steady, double-digit growth, the high-performance athletic apparel and footwear maker reported 7% growth on revenues of $1.1 billion with a net loss of $2 million in the first quarter 2017 and a $0.01 loss in diluted earnings per share. Stiff competition and lost U.S. retailers (e.g. – Sports Authority & Sports Chalet) are sighted as having a material impact on Under Armour’s results. Source: http://www.uabiz.com/results.cfm

While Under Armour had a difficult Q1 2017 their balance sheet remains healthy.

Competition includes the likes of Nike, Adidas, and Columbia Sportswear. Increased competition may have played a role in gross profits being down 70 basis points to 45.2 percent.

The apparel industry is highly competitive and this trend will continue. Under Armour needs to continue to innovate, they need to continue to expand both direct-to-consumer and international sales. Under Armour will also have to continue to pursue new retailers while maintaining margins.

2. Read Note 1—(Description of Business) of Under Armour, Inc.’s annual report. What do you learn here and why is it important?

The Description of Business in the 10-K filing is exactly that, a description of what the company does, any subsidiaries it might own, what markets it operates in. This section might also include significant recent events like competition, regulations, or labor issues, special operating costs, seasonality, etc… This is a summary of what the business is and how it operates.

Under Armour is in the apparel business.
Under Armour develops, markets and distributed branded apparel, footwear, and accessories.
Under Armour is focused on athletic apparel.
Under Armour conducts worldwide operations.
Under Armour is an active lifestyle brand.

3. Name two of Under Armour, Inc.’s competitors. Why is this information important in evaluating Under Armour, Inc.’s financial performance?

Nike and Adidas are the two competitors pointed out in Under Armout, Inc.’s 2014 10-K filing. (Harrison, Horngren & Thomas, 2017, p. 855)
Understanding the market segment, the trend within that segment and the total market opportunity is critical to understanding is how Under Armour is performing.
Competition provides both market opportunity, the ability to share shift an already defined market with a proven need but it also creates margin pressure. This greater the supply, the greater the competition the more advantageous it is for the consumer. Margin erosion is likely with increased competition. Under Armour, like it’s competition Nike and Adidas attempt to offset what could be called commodity apparel by creating “lifestyle” brands and creating “brand loyalty”

4. Write Under Armour, Inc.’s accounting equation at December 31, 2014 (express all items in millions and round to the nearest $1 million). Does Under Armour, Inc.’s financial condition look strong or weak? How can you tell?

Assets = Liabilities + Owners’ Equity (Harrison, Horngren & Thomas, 2017, p. 12)
$2,095,083 = 744,883 + $1,350,300

Owner’s (stockholder’s) equity has show y/y growth indicating that Asset growth is outpacing liability growth, indicating a strong financial position and a healthy trajectory.
Owner’s Equity = Assets – Liabilities

5. What was the result of Under Armour, Inc.’s operations during 2014? Identify both the name and the dollar amount of the result of operations for 2014. Does an increase (or decrease) signal good news or bad news for the company and its stockholders?

2014 Net Income (in thousands) = $208,042
Net Income (in thousands) increased from 162,330 in 2013 to $208,042.

6. Examine retained earnings in the Consolidated Statements of Shareholders’ Equity. What caused retained earnings to increase during 2014?

2014 Retained earnings (in thousands) = 856,867
2013 Retained earnings (in thousands) = 653,842

When revenues exceed expenses the result will be a positive balance in retained earnings. (Harrison, Horngren & Thomas, 2017, p. 18)

7. Which statement reports cash and cash equivalents as part of Under Armour, Inc.’s financial position? Which statement tells why cash and cash equivalents increased (or decreased) during the year? Which activities caused Under Armour, Inc.’s cash and cash equivalents to change during 2014, and how much did each activity provide or use?

Which statement reports cash and cash equivalents as part of Under Armour, Inc.’s financial position? Consolidated Balance Sheet
Which statement tells why cash and cash equivalents increased (or decreased) during the year? Consolidated Statement of Cash Flows
Which activities caused Under Armour, Inc.’s cash and cash equivalents to change during 2014, and how much did each activity provide or use?

Cash and cash equivilents – Beginning of period (in thousands) = 347,489
+ 2014 Cash flows from operating activities (in thousands) = 219,033
+ 2014 Cash flows from investing activities (in thousands) = (153,312)
+ 2014 Cash flows from financing activities (in thousands) = 245,686
+ 2014 Effect of exchange rate changes on cash and cash equivilents = (3,341)
= Cash and cash equivilents – Beginning of period (in thousands) = 347,489

347,489 + 219,033 + (153,312) + 245,686 + (3,341) = 593,175

References

Hall, J. (1970, January 01). Under Armour Inc Is Feeling the Impact of a Changing Retail Landscape. Retrieved May 04, 2017, from https://www.fool.com/investing/2017/04/28/under-armour-inc-is-feeling-the-impact-of-a-changi.aspx

Harrison, W. T., Horngren, C. T., & Thomas, C. W. (2017). Financial accounting. Boston: Pearson.

Assignments

[google-drive-embed url=”https://drive.google.com/file/d/0B1fr2Qqx-moWTUhabUE0WkZpWDQ/preview?usp=drivesdk” title=”Bocchinfuso_FIT-MGT5000_Week1-Chapter1-Problems.xlsx” icon=”https://drive-thirdparty.googleusercontent.com/16/type/application/vnd.openxmlformats-officedocument.spreadsheetml.sheet” width=”100%” height=”400″ style=”embed”]

Published by rbocchinfuso on May 30, 2017 | Permalink

FIT – MGT5114 – Week8 – Discussion 10

Discuss how having your personal information in online databases may lead to identity theft. How can you protect yourself from this?

Our personal data litters the Internet and as the digital world provides more convenience out digital footprint and the attack surface continues to grow. Most of us continue to trade convenience for security, so there is no end in sight. I could not help but think about the Wired editor who had his digital identity wiped when I read this question, if you haven’t read this story I highly recommend it. The ease with which a hacker can gain access to a single piece of information and use it as the catalyst to take over a person’s life is astounding. Vulnerabilities come in all forms but what is interesting about the Wired editor story is that the vulnerability was in the process and exploited via social engineering. The growth of Identity Theft Insurance demonstrates how real identity theft is. One protection approach is to limit what we store online, for instance when that little check-box pops up asking to save your credit card information don’t click it. More and more organizations are encrypting or hashing personal/private data that they store in online databases, but we still have to be careful. I used to use an expiring credit card designed for online purchasing; the system would generate a temporary credit card number with a credit limit equal to what I was going to purchase, this was a good system, but it became cumbersome, so I traded security for convenience. I try not to use the same passwords because if one online database is compromised, I don’t want to give the person with the data the keys to my kingdom by using the same password everywhere. It’s also important to recognize how important password length and complexity is, tools like hashcat and cloud computing have made cracking simple passwords a trivial and speedy task, what used to take years now takes minutes.

References

Honan, M. (2012, August 06). How Apple and Amazon Security Flaws Led to My Epic Hacking. Retrieved April 26, 2017, from https://www.wired.com/2012/08/apple-amazon-mat-honan-hacking/

Pascal, A. (2014, February 27). Online Identity Theft Statistics – And How to Protect Yourself. Retrieved April 26, 2017, from http://eggtoapples.com/blog/online-identity-theft-statistics-and-how-to-protect-yourself/

Published by rbocchinfuso on April 27, 2017 | Permalink

FIT – MGT5114 – Week8 – Discussion 9

Law and ethics are often both considerations when determining the reaction to a computer security incident. For instance, scanning for open wireless networks is not illegal unless the scanner connects to the network without permission. Discuss this issue in terms of the legal and ethical issues that surround using a wireless connection that you do not own.

Wireless network scanning is not unlawful, and the ethics should be determined by the intent of the individual doing the scanning. For instance, if the person is scanning the network to identify unsecured wireless access points or access points with weak encryption protocols like WEP so they can conduct an attack to gain unauthorized access it would clearly be unethical. Tools like aircrack-ng allow hackers to identify access points, obfuscate themselves, promiscuously collect packets and crack WEP keys and WPA passwords.

There is an argument that piggybacking on open wifi access points (APs) in not unethical. Those who argue this perspective state that some APs are intentionally left open so identifying wifi piggybacking on any open AP as unethical would be an incorrect assessment. Those who argue the unethical position state that it is unethical to cheat the ISP out of their revenue. There is the case of the man who was charged with a crime for using a cafe’s wifi by sitting outside and piggybacking from his car; this was deemed unlawful because the free wifi was intended for patrons, which he was not. I think the reality is that it’s hard to know if an open AP was left open intentionally or unintentionally, services like WiGLE provide data regarding “free” wif access points. With regards to the argument that it’s stealing from the ISP thus unethical, I would need to look at the ISPs terms of service. I know of many coffee shops who have residential class Internet service and provide “free” wifi to their patrons so it would seem that sharing you ISP connection via wifi is not illegal or unethical. We live in a connected world, and I think jumping on an open wifi AP has become a way of life thus moral intent is important when deciding if this behavior is ethical or unethical. No doubt this is a topic which is open to debate.

References

Cheng – May 22, 2007 3:37 pm UTC, J. (2007, May 22). Michigan man arrested for using cafe’s free WiFi from his car. Retrieved April 26, 2017, from https://arstechnica.com/tech-policy/2007/05/michigan-man-arrested-for-using-cafes-free-wifi-from-his-car/

Bangeman, E. – Jan 4, 2008 3:12 am UTC. (2008, January 03). The ethics of “stealing” a WiFi connection. Retrieved April 26, 2017, from https://arstechnica.com/security/2008/01/the-ethics-of-stealing-a-wifi-connection/

Pash, A. (2008, January 04). The Ethics of Wi-Fi “Stealing”. Retrieved April 26, 2017, from http://lifehacker.com/340716/the-ethics-of-wi-fi-stealing

Writer, L. G. (2011, September 10). Is it Legal to Piggyback WiFi? Retrieved April 26, 2017, from http://smallbusiness.chron.com/legal-piggyback-wifi-28287.html

Published by rbocchinfuso on April 27, 2017 | Permalink

FIT MGT5114 – Wk7 Discussion 1 Post

Security and risk are clearly related; the more at-risk a system or data set is the more security is desirable to protect it. Discuss how prices for security products may be tied to the degree of risk. That is, will people or organizations be willing to pay more if the risk is higher?

Absolutely, maybe, hmmm, what a complex world we live in. There is seemingly a direct correlation between, value of assets, reputation, etc… and the risk associated with a potential vulnerability, a successful exploit and what an organization is willing to pay to protect themselves. Some market segments make the decision to spend on security products clearer by imposing regulatory requirements that make the cost of non-compliance steep enough to mandate compliance.

For example:
Processing credit card transactions? You are subject to PCI DSS.
Do something regulated by the FDA? You are subject to Title 21 of the Code of Federal Regulations (21 CFR Part 11) Electronic Records
Do pretty much anything in health care? You are probably subject to Health Insurance Portability and Accountability Act (HIPAA) and The Health Information Technology for Economic and Clinical Health Act (HITECH) which means you better keep that patient data secure.

These regulations and other make the decision to invest in security products seemingly straightforward, but not everything is what it seems. Major breaches like Target who had 40 million credit and debit card records 70 million customer records (including addresses and phone numbers) lifted from their systems netted a loss of only 0.1% of their 2014 sales. The same is true of Home Depot who in 2014 had 56 million credit and debit card numbers and 53 million email addresses lifted from their systems which netted a loss of only 0.01% of their 2014 sales. These and many other firms have Cyber Liability Insurance to mitigate their losses, between payments from insurance and tax right offs the losses diminish, and so does the incentive to invest in security products.

When we look at sites like http://map.norsecorp.com/#/ that depict the velocity of attacks, and we think about the attack surface of an online entity the idea of “if there is a breach” probably should be replaced with “when there is a breach”. I would say there is some hedging occurring in the enterprise, where there is the balance between investments and projected losses due to a breach. No investment makes you hack proof, and if and when you are hacked having invested millions in technology to protect against a hack garners no reputation points so being smart about your security posture but not over investing and rolling the dice (it’s happening regardless) may be a prudent business decision. Stuxnet proved that even a facility which is off the grid is vulnerable to attack.

References

Data Breach & Cyber Liability Insurance. (n.d.). Retrieved April 19, 2017, from https://www.thehartford.com/data-breach-insurance

Michael Kassner | April 9, 2015, 12:45 PM PST. (n.d.). Data breaches may cost less than the security to prevent them. Retrieved April 19, 2017, from http://www.techrepublic.com/article/data-breaches-may-cost-less-than-the-security-to-prevent-them/

Staff, C. (2012, December 19). The security laws, regulations and guidelines directory. Retrieved April 19, 2017, from http://www.csoonline.com/article/2126072/compliance/compliance-the-security-laws-regulations-and-guidelines-directory.html#Electronic-Fund-Transfer

Zetter, K. (2014, November 03). An Unprecedented Look at Stuxnet, the World’s First Digital Weapon. Retrieved April 19, 2017, from https://www.wired.com/2014/11/countdown-to-zero-day-stuxnet/

Published by rbocchinfuso on April 26, 2017 | Permalink

FIT MGT5114 – Wk6 Discussion 1 Post

Discuss three possible inclusions in a security policy. How do they differ from those included in a business continuity plan?

“A security policy documents an organization’s security needs and priorities.” (Pfleeger, Pfleeger & Margulies, 2015, p. 671) “A security policy is a high-level statement of purpose.” (Pfleeger, Pfleeger & Margulies, 2015, p. 671) A security policy does not merely address a security posture from a technical perspective, such as identifying known vulnerabilities. A security policy is nuanced, having to take into consideration the assets which need to be protected, the value of these assets, potential regulatory concerns, etc… A security policy should consider the following:

Organizational goals.
Delegation of responsibility.
Organizational commitment.

While a security policy is a macro level statement of purpose, a security plan includes the security policy, but also includes details such as current state (the current security posture including gaps, likely the result of an assessment), requirements, recommendations, accountability (possibly in the form of a RACI matrix), timetable (project plan) and a maintenance plan focused on operational upkeep.

A “Business continuity plan documents how a business will continue to function during or after a computer security incident.” (Pfleeger, Pfleeger & Margulies, 2015, p. 681). “An ordinary security plan covers computer security during normal times (under normal operations) and deals with protecting against a wide range of vulnerabilities from usual sources.” (Pfleeger, Pfleeger & Margulies, 2015, p. 681). The text simply states that the difference between a security plan and a business continuity plan is that one is focused on establishing security guidelines that will be used during normal operations while the other is invoked by either a catastrophic failure or a prolonged outage which will negatively impact the business.

I would say that a security policy is part of business continuity plan (BCP), in other words, security policies exist inside the BCP plan. When a BCP plan is invoked due to a catastrophic event or prolonged outage, the goal of a business continuity plan is to have a playbook to return to normal operations under the worst of conditions, at which time security policies are reinstituted as part of the BCP plan. A security policy may also govern the execution of a business continuity or disaster recovery plan.

A final thought, this week’s discussion question seems to ask for a “security policy” to be contrasted with a “business continuity plan,” not a “security plan” with a “business continuity plan.” I hedged a bit with my response. 🙂

References

Pfleeger, C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in computing (5th ed.). Upper Saddle River: Prentice Hall.

Published by rbocchinfuso on April 25, 2017 | Permalink

FIT MGT5114 – Wk5 Discussion 1 Post

Question:

Telecommunication network providers and users are concerned about the single point of failure in the “last mile”, which is the single cable from the network provider’s switching station to the customer’s premises. How can a customer protect against that single point of failure? Provide an analysis on whether this presents a good cost-benefit trade-off.

Response:

The obvious answer here is to have redundant providers, but redundant links alone do not provide redundancy. To truly be redundant the solution must incorporate transparent failover. This is no different that a blown electrical circuit in your home. If the freezer is connected to a circuit which blows the idea that an adjacent outlet is available to power the freezer is meaningless if your sleeping or on vacation. For a system to have no single point of failure, redundant infrastructure (the easy part) must exist, but these systems also need to be self-healing. This concept has prompted the emergence of a field called site reliability engineering; this field focuses on the self-healing aspects of information systems at scale. Consumers or SMBs looking to protect themselves from “last mile” failures via infrastructure redundancy might use a “dial-up” connection but probably not because who still has a POTS line? The more likely option is a router which will handle both and wireline broadband and wireless broadband connections. Devices like the Failsafe Gigabit N Router for Mobile Broadband from Cradlepoint provide a cost effective way to transparent circuit failover. Because most ingress and egress traffic is NAT’d on a consumer grade networks (e.g. – your home network) a move from one provider to another can be performed quickly and nondisruptively. NAT’d traffic moves between your LAN and the Internet using a single IP public address (typically a DHCP address assigned by your provider), this makes it reasonable to use this approach for redundancy.

My home network is fairly complex (some pics from my home lab) with two circuits and multiple site-to-site VPNs to cloud providers. Both my wireline circuits as well as my broadband circuit are Verizon circuits with one wireline circuit being a business grade and one being a consumer grade, I leverage wireless broadband as my tertiary Internet connection (used broadband for two weeks following Hurricane Sandy). The business circuit differs in speed from my consumer circuit, and the business circuit provides me with public facing IP space and the ability to use my own router vs. the Verison FiOS provided router, these are key differentiators between consumer circuits and business circuits. I use pfSense as my router and firewall or choice, pfSense manages all my routing and circuit failover. Because this is my home lab I do not use something like BGP to manage external traffic and allow for transparent failover, what I do is monitor my home lab circuits using a witness process which runs a check against two IP addresses. For simplicity, IP address 1 is the advertised static public facing IP address on my Verizon Business circuit, and IP 2 is the NAT’d port forwarded address on my consumer grade FiOS circuit. IP 1 maps to host.domainname and IP2 resolves to host.dyndns, where dyndns is Namecheap’s dynamic DNS service. When all is well the host is directly accessible via IP 1, if something goes wrong, the host will become available using IP 2. Obviously, the use of BGP and an AS number to facilitate failover for my home lab would be a bit costly, so the witness process watches for service availability on IP 1 or IP 2 and updates the DNS A record of the service with my domain registrar if the service becomes reachable on an alternate path. My DNS provider is Namecheap, so the witness server test the process for accessibility and then uses PyNamecheap to update the A record programmatically. With a short TTL, the DNS records propagate, and public services are again available, albeit not no all services will failover, but web services are available with a little help from NGINX and reverse proxying.

The above is not very expensive from a pure infrastructure perspective. The consulting may be a bit costly if you are not capable of configuring it yourself but the cost to build in redundancy is getting lower and lower. Cloud providers like AWS with services Route 53, S3 and Lambda make it very cost effective to leverage all of their site reliability engineering to build disaster tolerant systems cost effectively without every worrying about the physical infrastructure. Is the time, energy and money worth it and is there an ROI depends on what you are looking to accomplish and the value of the services you are providing. I require public IP address space, not offered on a Verizon FiOS consumer grade circuits; I need a consumer grade Verison FiOS line for TV, the internet, and telephone service. For these reasons, it made sense for me to leverage the consumer grade line as a backup to provide access to critical systems and services in the event of something like a physical fiber cut, which has happened with the landscaper putting a shovel through the fiber (there are two fiber runs from the street to my house).

References

Pfleeger, C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in computing (5th ed.). Upper Saddle River: Prentice Hall.

Published by rbocchinfuso on April 5, 2017 | Permalink

FIT MGT5114 – Wk4 Discussion 1 Peer Response

Good post and you are certainly in the majority with your perspective regarding the existence of duplicate records in a database and the negative impact on DB integrity. My only issue with this question and the responses is the idea that duplicate “DB” records is primarily explored in the context of RDBMS. Professor Karadsheh mentions Big Data in a few response posts, Big Data and the emergence of NoSQL and Document Databases have challenged some of the concepts firmly rooted in legacy RDBMS best practices where relationships and table joins are foundational, and duplicate data typically presents a significant problem. At a high-level SQL database rely on structured data, tables with fields, normalized data inserted into these fields, relationships between tables and SQL statements to return results. It’s easy to see the pitfalls of a duplication in the context of an RDBMS. NoSQL or Document Databases use a key-value store paradigm, where keys and values are defined when unstructured, denormalized data is ingested. A good example of this is opening a stream from the Twitter API for something like sentiment analysis. I use this as an example because I am a heavy user of ElasticSearch (a NoSQL DB) for log and sentiment analysis. The benefit of NoSQL is the ability to ingest thousands of unstructured, denormalized records per second; these unstructured, denormalized records use key-value pairs to map the keys to data (value).

Here is an example use of ElasticSearch: A data stream is open using the Twitter API, the data stream is pushed into ElasticSearch and then Kibana is used to visualize sentiment. In this case, duplicate records don’t indicate that the that the integrity of the database is suspect, time series don’t matter, etc… What is important is the ability to stream of messages per seconds, use an NLP library to determine sentiment, create a JSON record containing key/value pairs and add to ElasticSearch.

ElasticSearch records look like this: http://www.awesomescreenshot.com/image/2357496/22cb647c962eb32ee38e8ad8ee3c13d5
POTUS Sentiment Analysis using ElasticSearch and Kibana: http://gotitsolutions.org/2017/02/24/potus-sentiment-analysis/

Like so many things I think the answer to this question in a context which defines DB as more than just RDBMS is, it depends. With that said I do agree that duplication in the context of traditional RDBMS can wreak havoc on data integrity.

References

Bocchinfuso, R. J. (2017, March 31). POTUS Sentiment Analysis. Retrieved April 02, 2017, from http://gotitsolutions.org/2017/02/24/potus-sentiment-analysis/

Issac, L. P. (2014, January 14). SQL vs NoSQL Database Differences Explained with few Example DB. Retrieved April 02, 2017, from http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/?utm_source=tuicool

Pfleeger, C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in computing (5th ed.). Upper Saddle River: Prentice Hall.

Published by rbocchinfuso on April 2, 2017 | Permalink

FIT MGT5114 – Wk4 Discussion 1 Post

Question:

Can a database contain two identical records without a negative effect on the integrity of the database? Why or why not?

Response:

I think this can be a complex question and needs to be qualified a bit. Not all databases are relational, a database could be comprised of a single table and single field with multiple rows which contain binary responses, something like “agree or disagree” responses to a question for something like sentiment analysis. An example here would be a question posed on a website where users are asked to “agree or disagree”. The user’s response is stored in a database table, and a query is used to count the “agree and disagree” responses.

An example of this is represented by the following sql statements:

sqlite> — create table
sqlite> create table sentiment (answer text);
sqlite>
sqlite> — insert data into table
sqlite> insert into sentiment (answer) values (‘agree’);
sqlite> insert into sentiment (answer) values (‘agree’);
sqlite> insert into sentiment (answer) values (‘agree’);
sqlite> insert into sentiment (answer) values (‘agree’);
sqlite> insert into sentiment (answer) values (‘agree’);
sqlite> insert into sentiment (answer) values (‘agree’);
sqlite> insert into sentiment (answer) values (‘disagree’);
sqlite> insert into sentiment (answer) values (‘disagree’);
sqlite> insert into sentiment (answer) values (‘disagree’);
sqlite>
sqlite> — query all records
sqlite> select * from sentiment;
agree
agree
agree
agree
agree
agree
disagree
disagree
disagree
sqlite>
sqlite> — query total number of responses
sqlite> select count(answer) from sentiment;
9
sqlite> — query total number or agree responses
sqlite> select count(answer) from sentiment where answer = (‘agree’);
6
sqlite> — query total number or disagree responses
sqlite> select count(answer) from sentiment where answer = (‘disagree’);
3

Above a DB table called “sentiment” is created with one field “answer”. Inserts represent data being inserted into the table “sentiment” and field “answer” to create records. The data is then used to calculate the total number of respondents, the number of respondents that agree and the number or respondents that disagree.

This is a simple example of a DB which stores data which can be mined to gather sentiment regarding the question posed to the user. In the example above there were nine total respondents, six who agree and three who disagree. I this case duplicate records are acceptable and expected with the goal of recording all responses and tabulating a count of each response type, database integrity is not affected negatively.

When the question is asked in the context of a relational database (RDBMS) the assumption that we can make is that there are relationships which are created between tables and these relationships likely rely on a unique identifier (a primary key) to ensure that records can be uniquely identified. The confusion that can be created when duplicate records exist can be demonstrated by a SQL update.

sqlite> — example of poorly designed db table with duplicate records
sqlite>
sqlite> — create table foo
sqlite> create table foo (name text, age integer);
sqlite>
sqlite> — insert data into table
sqlite> insert into foo (name,age) values (‘John’,10);
sqlite> insert into foo (name,age) values (‘Joe’,20);
sqlite> insert into foo (name,age) values (‘Jane’,30);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
John|10
Joe|20
Jane|30
sqlite> select * from foo where (name) = (‘John’);
John|10
sqlite>
sqlite> — create duplicate records
sqlite> insert into foo (name,age) values (‘John’,10);
sqlite> insert into foo (name,age) values (‘Joe’,20);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
John|10
Joe|20
Jane|30
John|10
Joe|20
sqlite> select * from foo where (name) = (‘John’);
John|10
John|10
sqlite>
sqlite> — create new record;
sqlite> insert into foo (name,age) values (‘Bob’,30);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
John|10
Joe|20
Jane|30
John|10
Joe|20
Bob|30
sqlite> select * from foo where (age) = (30);
Jane|30
Bob|30
sqlite>
sqlite> — update John’s age to 40
sqlite> update foo set (age) = (40) where (name) = (‘John’);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
John|40
Joe|20
Jane|30
John|40
Joe|20
Bob|30
sqlite> select * from foo where (name) = (‘John’);
John|40
John|40
sqlite>

Above we can see a table called “foo” consisting of two fields “name” and “age” is created, this table has names and ages added to it, with the record “John | 10” and “Joe | 20” being duplicated. An update is made to the database to change John’s age from 10 to 40. This update impacts all John’s records because there is not a unique identifier in the record which can be used. While in the previous example where I stored information for sentiment analysis I showed that it is possible to have a database where integrity is not impacted by duplicate records, in general, this is a poor design choice and can be easily fixed with the addition of a primary key.

Below you will see the subtle but powerful difference that a primary key offers.

sqlite> — proprely designed db table avoids duplicate records
sqlite>
sqlite> — create table foo with autoincrementing primary key
sqlite> create table foo (id integer primary key autoincrement, name text, age integer);
sqlite>
sqlite> — insert data into table
sqlite> insert into foo (name,age) values (‘John’,10);
sqlite> insert into foo (name,age) values (‘Joe’,20);
sqlite> insert into foo (name,age) values (‘Jane’,30);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
1|John|10
2|Joe|20
3|Jane|30
sqlite> select * from foo where (name) = (‘John’);
1|John|10
sqlite>
sqlite> — create duplicate records
sqlite> insert into foo (name,age) values (‘John’,10);
sqlite> insert into foo (name,age) values (‘Joe’,20);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
1|John|10
2|Joe|20
3|Jane|30
4|John|10
5|Joe|20
sqlite> select * from foo where (name) = (‘John’);
1|John|10
4|John|10
sqlite>
sqlite> — create new record;
sqlite> insert into foo (name,age) values (‘Bob’,30);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
1|John|10
2|Joe|20
3|Jane|30
4|John|10
5|Joe|20
6|Bob|30
sqlite> select * from foo where (age) = (30);
3|Jane|30
6|Bob|30
sqlite>
sqlite> — update John’s age to 40 where id = N
sqlite> update foo set (age) = (40) where (id) = (4);
sqlite>
sqlite> — query db table
sqlite> select * from foo;
1|John|10
2|Joe|20
3|Jane|30
4|John|40
5|Joe|20
6|Bob|30
sqlite> select * from foo where (name) = (‘John’);
1|John|10
4|John|40
sqlite>

I the above example additional field “id” is added as a primary key, this is not a user entered field but a field that is auto generated and guarantees that each record is unique and uniquely identifiable. This subtle design change allows the manipulation of a John with the ID = 4. While the sentiment example I gave works as is and there is no posed threat to data integrity, the addition of a unique id as a primary key would be a welcomed and desirable design change.

My apologies for all the SQL but I thought the best way to convey my thoughts would be to use examples. I think the easy answer here would have been just to say NO, that a database can NOT contain two identical records without a negative effect on the integrity of the database, but it think the answer is “it depends”. With this said I think it is a best practice to have a way to uniquely identify database records because the inability to manipulate data programmatically can create some serious issues. Additionally, schema extensions or changes are very difficult when a primary key does not exist. Finally the inability to uniquely identify records or elements greatly impacts the ability to apply security paradigms.

References

Pfleeger, C. P., Pfleeger, S. L., & Margulies, J. (2015). Security in computing (5th ed.). Upper Saddle River: Prentice Hall.

SQLite Home Page. (n.d.). Retrieved March 28, 2017, from https://www.sqlite.org/

SQL code used in above examples:

— example of poorly designed db table with duplicate records

— create table foo
create table foo (name text, age integer);

— insert data into table
insert into foo (name,age) values (‘John’,10);
insert into foo (name,age) values (‘Joe’,20);
insert into foo (name,age) values (‘Jane’,30);