Chapter 2: Exploring Data w/ Graphs
STATISTICS EXPLORATION # 1
SOME GRAPHICAL DESCRIPTIVE METHODS
PURPOSE - to use MINITAB to
distinct values in the data set and their corresponding number of
Note: We can also have horizontal projection graphs. In such cases, the y-axis (vertical axis) will be the base.
An example of a frequency projection graph is shown in Figure 1.1. Note the frequencies are located at the top of the vertical lines.
Figure 1.1: Example of a Frequency Projection Graph
An example of a bar chart is shown in Figure 1.2. Note the frequencies are located at the top of the vertical boxes.
Figure 1.2: Example of a Bar Chart
frequencies for the individual data values as points and then using straight
line segments to connect the plotted points.
An example of a frequency polygon is shown in Figure 1.3. Note the frequencies for the individual values are located where two line segments intersect.
Figure 1.3: Example of a Frequency Polygon
An example of a pie chart is shown in Figure 1.4. Note the frequencies (and percentages) for the individual categories (grades A, B, C, D, and E in this case) are located near the corresponding sectors ("slices") of the "pie".
Figure 1.4: Example of a Pie Chart
An example of a histogram is shown in Figure 1.5. Observe that the number of values (frequency count) in each class is located at the top of the rectangle that is associated with the class.
Figure 1.5: Example of a Histogram
stem and part of the data value as the leaf to form groups or classes. Observe
that the actual data values are used in creating the plot. Hence more
information is contained in the plot as compared to a histogram.
An example of a stem and leaf plot is shown in Figure 1.6. The first row of values is -26, -25, and -25. These values are separated into two parts. For example, the value of -26 is divided into two parts: the stem (-2) and the leaf (6). The stems are the tens digits and the leaves are the units digits.
Figure 1.6: Example of a Stem & Leaf Plot
Note: When the values in the data set are large, you can use the first two, three etc. digits in the number to form the stems with the units digits forming the leaves. However, there are other ways to form the stems and leaves.
of a set of ordered pairs (paired values).
An example of a scatter plot (diagram) is shown in Figure 1.7.
Figure 1.7: Example of a Scatter Plot (Diagram)
First, load the MINITAB (windows version) software as described in Exploration #0.
Note: The procedures presented in these explorations may not be the only way to achieve the end results. Also, whenever graphs are presented, only the MINITAB graphics features will be used.
1. CONSTRUCTING FREQUENCY TABLES
Example 1: Use MINITAB to construct an ungrouped frequency, cumulative frequency, relative frequency, and cumulative relative frequency TABLE for the variable Cartoon1 from the Cartoon.mtw data worksheet. This worksheet comes with the MINITAB program.
To use the Cartoon.mtw worksheet, you first need to open it. Select File® Open Worksheet. The Open Worksheet dialog box will be displayed. Use the mouse to select the file named Cartoon.mtw. It will then be listed in the File Name box. The Open Worksheet dialog box is shown in Figure 1.8. Note: If the data files do not appear as in Figure 1.8, double click on the Data folder to display the files.
Figure 1.8: Open Worksheet Dialog Box
Click on the OK button. The Data window will display the Cartoon data set. Observe that the variable Cartoon1 is in column C6 and there are 179 observed values.
To create tables in MINITAB, select Stat® Tables® Tally. The Tally dialog box will appear. Click on C6 Cartoon1 then click on the Select button. The variable Cartoon1 will appear in the Variables text box. Next, select Counts, Percents, Cumulative Counts, and Cumulative Percents by clicking on the boxes. A check mark (3 ) in the box will indicate that you have made that selection. This is shown in Figure 1.9. Note that Counts in the Tally box represents the frequency count, and Percents represent the relative frequency expressed as a percent.
Note: The relative frequency for a given data value is computed by dividing the
frequency count (f) of the data value by the total number of values in the data set (sample size, n).
Notation: Relative frequency = .
Figure 1.9: Tally Dialog Box
Select the OK button and the tables will be displayed in the Session window as shown in Figure 1.10. The first column lists the values of the variable Cartoon1, the second column lists the frequency counts, the third column lists the cumulative frequency counts, the fourth column lists the relative frequencies as a percent, and the last column lists the cumulative relative frequencies as a percent for the values of the variable.
Figure 1.10: Tally Table for the Cartoon1 Variable
Note: If your computer is connected to a printer with the appropriate printer driver you can print any active window by selecting File® Print Window.
CAUTION: Be careful to highlight the portion of the window you want to print before printing any window. If not, the entire window will be printed and will include all unnecessary information.
To display this set of information in the Data window so that you can construct appropriate graphs and charts, make the Session window the active window by clicking on it. Next, highlight the first row (0 3 3 1.68 1.68) to the last row (9 38 179 21.23 100.00) using the mouse. To copy the highlighted part of the Session window, select Edit® Copy. We will place this information into a new worksheet. To achieve this, select File-> New and the New dialog box will be displayed as shown in Figure 1.11. Select Minitab Worksheet and press the OK button. A new worksheet will be displayed.
Figure 1.11: New Dialog Box
This new worksheet will be active. Click on the cell corresponding to row 1 and column C1 and select Edit® Paste Cells. A MINITAB dialog box will appear. Make sure you select Use spaces as delimiters and click on the OK button. The MINITAB dialog box is shown in Figure 1.12.
Figure 1.12: MINITAB Dialog Box
The values will be inserted into column C1 through C5. This worksheet with the data is shown in Figure 1.13. Note the columns have been renamed.
Figure 1.13: Worksheet with Table for the Cartoon 1 Variable
Recall that a line (projection) graph uses the length of vertical (horizontal) lines to represent the frequencies or the probabilities of the values along the base.
Example 2: Construct a projection (line) graph for the variable Cartoon1.
First select Graph® Plot. In the Graph Variables box select C2 (Count) for Y and C1 (Cartoon1) for X. In the Data Display drop down Display box select Project. The Plot dialog box with the appropriate selections is shown in Figure 1.14. This sequence of commands will construct a projection graph for you with the frequencies along the Y-axis and the data values along
Figure 1.14: The Plot Dialog Box with Selections to plot a Projection Graph for Example 2
Click on the OK button and the projection graph for the variable Cartoon1 will be displayed. This projection graph is shown in Figure 1.15. Note: The graph has been edited such that the projection lines can be viewed clearly.
Observe in Figure 1.14, that there are other options that one can select from the dialog box. At this point in time, we will ignore them. We will use only the necessary options to generate our graph. As we get more experience with the software, we will use the other options from the dialog box to display other characteristics of the graph.
Figure 1.15: Projection graph for Example 2
Observe that the shape of the projection graph displays some skewness to the left.
Recall that a frequency polygon for ungrouped data is a graph that displays the frequencies for the individual data values as points and then using straight-line segments to connect the plotted points.
Example 3: The following table summarizes the chest sizes of Scottish militiamen in the early 19th century. Chest sizes were measured in inches, and associated with each chest size observation is the number (frequency or count) of soldiers with that chest size.
Construct a frequency polygon graph for the variable Chest.
First, enter the values for Count and Chest into two columns in your data worksheet. To construct a frequency polygon for the variable Chest, follow the above procedure for constructing a projection graph. Select (Count) for Y and (Chest) for X. In the Data Display drop down Display box select Connect. The frequency polygon is shown in Figure 1.16.
Figure 1.16: Frequency Polygon for Example 3
Observe that the shape of the polygon is approximately "bell" shaped.
Recall that a bar graph (bar chart) uses vertical or horizontal bars to represent the frequencies of the individual categories when the data is qualitative.
Example 4: A sample of 300 college students was asked to indicate what was their favorite soft drink. The survey results are shown in the following table. Use MINITAB to construct a bar graph for the data.
To construct a bar graph (chart) for the data, enter the information in two separate columns. You can label them as DRINK and COUNT. You can label the columns by typing in the names in the cells below the labels C1, C2, C3, etc., and the first row in the data worksheet. An example of where the labels are inserted is shown in Figure 1.13. Select Graph® Chart and the Chart dialog box will be displayed. Fill in the appropriate boxes as shown in Figure 1.17. Click on the OK button once the appropriate boxes are selected. The bar chart for the data will be generated and is shown in Figure 1.18. Observe that the boxes for the different categories are not adjoining to each other. Observe that the box associated with the preference for Pepsi is the tallest indicating that for this survey, the number one preference for the students was Pepsi.
Figure 1.17: Chart Dialog Box for Example 4
Figure 1.18: Bar Chart for Example 4
Recall that a pie chart is often used to plot relative frequencies (percentages) when the data are
qualitative. A circle is constructed and divided up into sectors whose areas are proportional to the frequencies or relative frequencies (percentages).
Example 5: A survey was conducted on a university campus with regards to the monthly income of the students. The responses of 500 students are summarized in the following table.
To construct a pie graph (chart) for the data, enter the information in two separate columns. You can label them as INCOME and COUNT. Use the classifications of D, L, B, and O, in the INCOME column. Select Graph® Pie Chart and the Pie Chart dialog box will be displayed. Fill in the appropriate boxes as shown in Figure 1.19. Note that we selected the option of Chart Table since we have the categories in the INCOME column and the frequencies in the COUNT column. Click on the OK button once the appropriate boxes are selected. The pie chart for the data will be generated and is shown in Figure 1.20. Observe that associated with the "slices" of the pie are the corresponding frequency counts and the equivalent percentages.
Figure 1.19: Pie Chart Dialog Box for Example 5
Figure 1.20: Pie Chart for Example 5
6. CONSTRUCTING A HISTOGRAM
Recall a histogram is a graph in which quantitative data are divided into groups or classes, with the classes plotted along the horizontal axis. Rectangles are constructed for these classes with the rectangles placed adjacent to each other. The heights of the rectangles represent the frequencies for the classes. The vertical axis of a histogram can represent either the class frequency or the relative class frequency. In addition, information is lost when a histogram is constructed since we do not know the actual values that are contained in each class.
Example 6: Use MINITAB to construct a histogram for the variable Math from the Grades.mtw data worksheet. This worksheet comes with the MINITAB program.
Review Example 1 to see how to open a MINITAB worksheet.
To construct a histogram for the variable MATH, select Graph-> Histogram. In the Graph variables box, select MATH and in the Data Display box, select Bar. Select Options and in the Histogram Options box, select Frequency for the Type of Histogram. For the Type of Intervals select CutPoint and for the Number of intervals, type 7 into the text box. That is, the histogram will be constructed with seven intervals with the x-axis values located at the right end-points of the intervals. Click on the OK button and the histogram will be generated. Figure 1.21 shows the Histogram dialog box. Figure 1.22 shows the resulting histogram.
Figure 1.21: Histogram Dialog Box for Example 6
Observe that the frequency count for each interval is located at the top of each box. This can be achieved by clicking on the Annotation button in the Histogram dialog box shown in Figure 1.21. Select the Data Labels option and in the Data Labels dialog box, select Show data labels. This procedure will allow the data labels (frequency counts) to be displayed as in
Figure 1.22: Histogram for Example 6
Observe that the shape of the histogram is approximately symmetric (bell shaped).
7. CONSTRUCTING STEM AND LEAF PLOTS
Recall that a Stem and leaf plot is a data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes. When the actual data values are used in creating the plot it will contain more information than in a histogram, since in a histogram all we know will be the frequency count in each interval (class) and not the actual values themselves.
Example 7: Use MINITAB to construct a stem-and-leaf plot for the variable Math from the Grades.mtw data worksheet. This worksheet comes with the MINITAB program and was used in Example 6.
To construct stem and leaf plots for the variable MATH, select Graph-> Stem-and-Leaf. In the Variables text box that appears, select MATH. If you select OK, a stem-and-leaf display will be shown in the Session window. You can select different lengths of intervals for the display from the Increment box. For Figure 1.23, an increment of 20 was used. Note that the leaf unit is 10. That is, for example, the first value of 4 4 is equivalent to 440; the value of 7 8 is equivalent to 780 etc.
Figure 1.23: A Stem-and-Leaf for the MATH Variable with Leaf Unit = 10
Note: Figure 1.24 shows another stem-and-leaf graph for the same MATH scores. Observe that in this case the leaf unit is one. Thus, the first value of 44 1 is equivalent to 441, the minimum score. Similarly, the last value of 80 0 is equivalent to 800, the maximum score.
Stem-and-Leaf Display: Math
Stem-and-leaf of Math N = 200
Leaf Unit = 1.0
1 44 1
2 47 1
3 49 0
5 50 09
6 51 5
10 52 6689
11 53 8
12 54 7
17 55 56777
21 56 1677
28 57 3555566
34 58 334466
38 59 2333
52 60 11222222255559
64 61 111111444448
80 62 0011114444469999
90 63 0044558889
(11) 64 33335777799
99 65 223333668
90 66 12444555558889
76 67 22224447777
65 68 1112336
58 69 2222355
51 70 0001111115
41 71 0000000599999
28 72 4444999
21 73 47778
16 74 6688
12 75 35
10 76 44
8 77 1377
4 78 2
3 79 6
2 80 00
Figure 1.24: A Stem-and-Leaf for the MATH Variable with Leaf Unit = 1
8. CONSTRUCTING A SCATTER PLOT or SCATTER DIAGRAM
Recall that a scatter diagram or scatter plot is a two-dimensional rectangular plot of a set of ordered pairs (paired values).
A scatter plot, like a histogram, is a good visual means to understanding patterns of bivariate numerical data. Construction of a scatter plot is straightforward: each point on a scatter plot corresponds to one bivariate observation. That is, each point corresponds to a (x, y) pair.
A scatter plot provides a visual means of seeing relationships between the two variables. The relationship is said to be positive if an increase in one variable corresponds to an increase in the other. When one variable increases and the other decreases, we say that the relationship is negative.
Generally, a scatter plot can tell us whether the relationship (or pattern) of the bivariate data has a:
Positive (negative) linear relationship
Positive (negative) curved relationship
Note: A scatter plots can only be constructed for quantitative variables.
Example 8: Consider the following table, which contains measurements on two variables for ten people: the number of hours the person spent riding a bicycle in the past week and the number of months the person has owned the bicycle. Use MINITAB to present a scatter plot for this information with the number of hours along the vertical axis and the number of months owned along the horizontal axis.
First, enter the values into two columns in MINITAB. Label the column with the number of hours exercised as Y and label the number of months owned as X.
To construct scatter plot for the data set, select Graph-> Plot and the Plot dialog box will appear. In the Graph variables text box, enter the appropriate variable as shown in Figure 1.25. In the Data display box, select Symbol as the Display option. If you select OK, a scatter plot will be generated. This plot is shown in Figure 1.26.
Figure 1.25: Plot Dialog Box for Example 8
Observe from Figure 1.26 as the X values are increasing the Y values are decreasing. In this case we say we have a negative relationship between these two variables. That is, the number of months of exercise is decreasing with the number of months the exercise machines were owned.
Example 9: The following table displays temperature data for 1988 taken from the middle of the Chesapeake Bay mouth. Use MINITAB to present a scatter plot for this information with the temperature along the vertical axis and the month along the horizontal axis.
Note: The months will be coded with the values 1, 2, 3, , 11, 12.
First, enter the values into two columns in MINITAB. Label the column with the temperature as Y and label the month as X. Follow the procedure as described in Example 8 and the resulting scatter plot will appear as in Figure 1.27.
Figure 1.26: Scatter Plot for Example 8
Figure 1.27: Scatter Plot for Example 9
Observe that the scatter plot has a non-linear pattern.
In a later Explorations, we will use MINITAB to model patters displayed in a scatter plot.
Example 10: Construct a grouped frequency distribution from the histogram displayed in Figure 1.22.
From Figure 1.22, if we look along the x-axis, we have intervals from 420 to 480, 480 to 540, etc. At the top of the boxes for these intervals observe the corresponding frequency counts for the intervals. We can use these two bits of information to construct the grouped frequency distribution. This is given in the following table.
Note: For example, the endpoint of 480 is listed in two intervals. It is listed as the upper limit for the first interval and as the lower limit for the second interval. We will use the convention that the upper limit for the intervals are not included as part of the intervals. In doing so, we will not be including a value in two different intervals.
EXPLORATION #1: HOMEWORK ASSIGNMENT
Name: _____________________ Date: ______________________
Course #: ___________________ Instructor: _________________
Provide a print out of this projection graph.
Provide a print out of this histogram.
(l) Describe the general shape of the histogram generated in part (k). Discuss.
Provide a print out of the graph.
Provide a print out of the graph.
(d) What can you estimate with this graph? Discuss.
Provide a print out of the histogram.
(b) Describe the general shape of the histogram. Discuss.
Provide a print out of the graph.
Provide a print out of the graph.
In MINITAB, enter in one column these classifications and in another column, the number of deaths.
(a) Construct a pie chart for this data.
Provide a print out of the pie chart.
(b) Describe the general shape of the plot. Discuss.
(c) Construct a stem-and-leaf plot for the variable PH. Do not specify any increment in the Stem-and-Leaf dialog box.
Provide a print out of the plot.
(d) Describe the general shape of the plot. Discuss.
Source: The U.S. Dept. of Health and Human Services, National Center for Health Statistics, Monthly Vital Statistics Report, Oct. 11, 1994.
(mother, father) _______________________(years)
(g) What factors may change your life expectancy? Discuss.
8. The following table gives the population (in millions) of the United States from 1790 to 1990.
Source: U.S. Bureau of the Census.
Note: The data before 1960 does not include data from Alaska and Hawaii.
In MINITAB, enter the years in one column and the corresponding population in another column.
© 1995-2002 by Prentice-Hall, Inc.|
A Pearson Company