Demystifying Data Analysis

117 views 4 pages ~ 1053 words

Get a Custom Essay Writer Just For You!

Experts in this subject field are ready to write an original essay following your instructions to the dot!

Statisticians and researchers in the modern era agree that most of the phenomena in the world are normally distributed. In this regard, the empirical technique can always be applied in the determination of the probability of an occurrence from the population. For instance, we can calculate the probability of download time for a commercial tax preparation site having the mean of 2.0 seconds and a standard deviation of 0.5 seconds. It should be noted that the first step involves the standardization of the observed score after which the probability is interpreted by the help of the Z-probability statistical table. The questions bellow will be answered with respect to the mean and standard deviation mentioned above. Therefore, what is the probability that the download time is:

(a) above 1.8 seconds

Solution

Z-score =

From the above diagram it is evident that the shaded region represents the probability that the download time is greater than 1.8 seconds.

(b) Between 1.5 and 2.5 seconds?

Solution

In this regard, the probability is given by,

Solution

We proceed to determine the z-score associated with the probability that the download time will be 99%. From z-statistical table, tit is evident that the z-score is given by,

Therefore, the observed value above the probability of 99% is obtained by substituting the z-score, mean and standard deviation value in the z-score formula.

The results imply that 99% of all the download times are slower than 3.2 seconds.

Part B

The analysis of raw data is fundamental not only for researchers but also to the citizens globally since inferences can be made from the results. Moreover, results from the analysis can be used by decision makers to formulate policies and forecast the events in the future. In this case, the data obtained from the cost of electricity in a big city is employed to determine if it is indeed normally distributed. Apparently, a sample of 50 observations from the year 2005 is used in this study. This will, in turn, enable the researchers to identify if the electricity company is either fair or biased regarding charges on households.

Box Plot

The box plot is often used to check the distribution of the observations in a sample. It is one of the visual aids in data analysis that helps to determine if the data is skewed or normally distributed by observing the variability of the median from the first and the third quartiles.[1]

In this regard, we first arrange the data in ascending order to determine the values of the quartiles.

Quartiles

From the box plot graph above, it is evident that the utility data set follows a normal distributed as the whiskers are of equal lengths. Moreover, the median value is almost at the center between the first and third quartile values which means that the data is evenly spread on both sides of the mean.

Histogram

Similarly, the excel spreadsheet was used in developing the histogram in representing the cost of electricity in the sample.

Bin

Frequency

Cumulative %

2.00%

101

8.00%

119

22.00%

138

38.00%

157

62.00%

176

82.00%

194

92.00%

213

98.00%

232

100.00%

The data in the table above, was used in drawing the histogram for electricity cost below,

The graph above depicts data that the electricity cost in the sample is approximately normally distributed. In this regard, it can be deduced that the number of observations on the left and right with respect to the mean is equal. Therefore, most of the electricity bills in the city are clustered around the mean. Apparently, there are no extreme observations in the data set as all the individual data points are significantly close to one another.

Theoretical Properties

Descriptive Statistics

Utility Charge

Mean

147.06

Standard Error

4.4818

Median

148.5

Mode

130

Standard Deviation

31.6914

Sample Variance

1004.3433

Kurtosis

-0.5442

Skewness

0.0158

Range

131

Minimum

Maximum

213

Sum

7353

Count

Confidence Level (95.0%)

9.0066

From the table above, there mean is slightly less than the median by $1.44. This implies that a majority of the individuals living in the city where the sample was collected pay their electricity bills closely to the average value of $179.06. The low skewness value of 0.0158 implies that there is no significant difference between the left and the right tails in the sample.

Inter-quartile Range (IQR)

According to the An introduction to statistical methods and data analysis by Ott and Michael (2015), the difference between the third and the first quartile values is referred to as IQR.[2]

Indeed, the IQR is 1.33 times larger than the standard deviation in the sample.

Range

The range is 131 from the descriptive statistics table above. In the bid to compare the range and the standard deviation we divide the former by the latter as shown below,

Thus, the range is 4.13 bigger than the standard deviation.

Empirical Formula

Interval

Percentage

Therefore, at least 66% of the data points lie in the interval (115.37, 178.75) which represents the observations that are one standard deviation from the mean on both sides. On the other hand, 80% of the data points lie within 1.28 standard deviations from the mean having a range of (106.50, 187.62) and 2% of the observations in the sample fall outside 2 standard deviations from the mean.

Normal Probability Plot

Number

Ordered X

Ordered probability

Ordered Z

0.01

-2.32635

0.03

-1.88079

0.05

-1.64485

0.07

-1.47579

102

0.09

-1.34076

108

0.11

-1.22653

109

0.13

-1.12639

111

0.15

-1.03643

114

0.17

-0.95417

116

0.19

-0.8779

119

0.21

-0.80642

123

0.23

-0.73885

127

0.25

-0.67449

128

0.27

-0.61281

129

0.29

-0.55338

130

0.31

-0.49585

130

0.33

-0.43991

135

0.35

-0.38532

137

0.37

-0.33185

139

0.39

-0.27932

141

0.41

-0.22754

143

0.43

-0.17637

144

0.45

-0.12566

147

0.47

-0.07527

148

0.49

-0.02507

149

0.51

0.025069

149

0.53

0.07527

150

0.55

0.125661

151

0.57

0.176374

153

0.59

0.227545

154

0.61

0.279319

157

0.63

0.331853

158

0.65

0.38532

163

0.67

0.439913

165

0.69

0.49585

166

0.71

0.553385

167

0.73

0.612813

168

0.75

0.67449

171

0.77

0.738847

172

0.79

0.806421

175

0.81

0.877896

178

0.83

0.954165

183

0.85

1.036433

185

0.87

1.126391

187

0.89

1.226528

191

0.91

1.340755

197

0.93

1.475791

202

0.95

1.644854

206

0.97

1.880794

213

0.99

2.326348

From the graph above, the scatter plots fall along the trend line which justified the fact that the data is normally distributed. Besides, it should be noted that most of the non-normal distributions have an s-shaped graph regarding normal probability plot.

Conclusion

Indeed the cost of electricity in the sample is approximately normal from the analysis sections above. This implies that the company offering the electricity fairly charges its customers. Therefore, it can be concluded that the cost of electricity for individuals living in a one bedroomed house in the city under study also follows a normal distribution. However, this result cannot be used to determine the normality of other cities in the country because of the varying nature of the rates as one move from once city to another.

Bibliography

Ott, R. Lyman, and Micheal T. Longnecker. An introduction to statistical methods and data analysis. Nelson Education, 2015.

Triola, Mario F. Elementary statistics. Reading, MA: Pearson/Addison-Wesley, 2006.

[1] Mario F. Triola,

Elementary statistics. Reading, MA: Pearson/Addison-Wesley, 2006

[2] Lyman R. Ott, and Longnecker T. Micheal, An introduction to statistical methods and data analysis. Nelson Education, 2015.

September 25, 2023

Category:

Health Information Science and Technology Science

Subcategory:

Medicine Math

Subject area:

Data Analysis Statistics

Number of pages

Number of words

1053

Downloads:

Rate:

4.7

Expertise Statistics

Verified writer

Clive2020 is an excellent writer who is an expert in Nursing and Healthcare. He has helped me earn the best grades with a theorists paper and the shadowing journal. Great job that always stands out!

Hire Writer

This sample could have been used by your fellow student... Get your own unique essay on any topic and submit it by the deadline.

Eliminate the stress of Research and Writing!

Hire one of our experts to create a completely original paper even in 3 hours!

Hire a Pro

Related Essays

230 views 6 pages ~ 1413 words

Case Study Hospital

The Case Study of Springfield General Hospital

This paper examines the case study relating to the welfare of the patients at Springfield General Hospital to investigat...

156 views 8 pages ~ 1928 words

Data Analysis

Analysing Data Using Descriptive Statistics

Using the Random Number Generator feature in Excel, 5000 samples of size 10 are generated. Out of the 13 possible intege...

94 views 2 pages ~ 327 words

Data Analysis

The Process of Data Analysis

The process of data analysis is comprised of five core processes that begin with data collection from a range of sources...

154 views 3 pages ~ 609 words

Data Analysis Statistics Correlation

Pearson Correlation Test

The present assignment applied Pearson correlation test to evaluate the direction and strength of the association betwee...

237 views 12 pages ~ 3187 words

Breakfast Eating Nutrition

Breakfast Consumption and Academic Performance

The current research article provided a comprehensive review of the association between work, academic performance and b...

85 views 2 pages ~ 398 words

Research Data Analysis Statistics Theory

A Comparison of Descriptive and Differential Statistics

The aim of this paper is to give a broad understanding of the underlying concepts of quantitative methods as well as off...

Similar Categories

Euthanasia Nutrition Cloning Medical Marijuana Organ Donation Vaccination Plastic Surgery