Chapter 5 800 Children and Teens, part two
Previously, in “Awash”: We’re investigating the heights of 800 5–19-year-olds. We saw that, in general, males were taller than females. But our first graph was bogus, because it didn’t take age into account.
To try to make better sense of the data, we limited ourselves to the 10-year-olds. For those kids, the females were taller than the males. Not all of them, but in general. That is, the mean height for the 10-year-old girls was larger than the mean height for 10-year-old boys.
But we want to understand gender differences in height for all ages, not just age 10.
You can imagine (or you might have actually done this) doing the procedure we just did for 10-year-olds for every age from 5 to 19. You could get the mean height for males and females at every age, and then plot it.
That’s complicated. All that selecting and hiding, putting on the means, and (ick) writing it down, and then entering the values. Surely there’s a way to have the computer do that.
There is a way, and that’s what this chapter is about.
5.1 Making Groups
We’ll start by making groups, one group for each age. Concentrate on the table.
Ageto the left in the table (don’t go to a graph!).
- Drop it in the blank area on the left of the table (it will turn yellow when you’re over it).
xxx NB: perfect place for a short video
Now, on the left, there is one case for each
Age. There are fifteen cases in all (why?).
- Click on one of the ages at the left. What happens in the graph? In the table?
Aha: clicking on an age selected all of the people who are that age. Also, you can see in the right side of the table that all of the people of that age are now together in the table—and selected.
When you dropped
Ageon the left, you sorted the table into 15 groups, one group for each age. You can think of it as a hierarchical table: on the left, a table of 15 ages, and on the right, within each age, a table of the people at that age.
Grouping is our second core data move. Start to look for how grouping your data might help you. Frequently, when a dataset is large and complicated, grouping will help you make sense of it.
Watch out, though: making too many levels of groups can sometimes make a dataset more complicated than it needs to be!
5.2 Making Summary Calculations for Each Group
Now we want the mean height for each of our groups. To do that, we’ll make a new column in the “groups” table on the left, and write a formula for the column:
- Be sure the table is selected.
- On the left-hand side of the table, up at the top on the right, there is a gray circle with a plus sign in it. It might be hidden by some text.
- Click the gray plus thingy. A new column appears, with a name ready to be editied.
- Give it a good name such as
MeanHeight. Press enter to finish editing. The column should be blank.
- Left-click on the column (attribute) name; a menu appears. Choose Edit Formula. A formula box appears.
mean(Height). Press Apply.
Hooray! You see the mean height for each age in the right row in that new column.
Does it bother you that the ages are not in order? Click on the colum heading for
Ageto get the menu, then choose Sort Ascending.
The mean height is a summary of each group. This action of summarizing (sometimes also called aggregating) is the third core data move. We now have three: filtering, grouping, and summarizing.
A summary doesn’t have to be a mean. It might be a median, or a sum, or just the count (a.k.a. frequency) of the cases in the group. It could even be a percentage, like the percentage of people in the group who have a BMI under 30.
CODAP has a number of functions that serve as summaries. Here are four of the most important:
||the mean of
||the median of
||add up the values of
||how many cases there are|
5.3 Finishing Our Investigation
The new column,
MeanHeight, is first-class data like every other column.
That means you can make a graph using these mean heights.
So do it!
- Make a new graph; put
Ageon the horizontal axis and
MeanHeighton the vertical.
You will see the pattern you might expect: people get taller as they age, up to a point.4
We still don’t see the gender differences. Here’s what you do. Watch what happens carefully and make sure you understand it.
Genderleft in the table and drop it next to
Each item in the left table splits into males and females.
So where there were the 15 ages before, now there are 30 age-gender combinations.
Also, the right-hand table is now divided into 30 groups, one for each age-gender combination.
MeanHeight column now automatically shows the mean height for the cases in that group.
There are also now 30 dots in the graph, one for each group instead of one for each person. But which dots are for the males and which for the females?
Genderfrom the table and plop it into the middle of the graph. The points color to show which is which. You should see the graph on the right:
Notice what a clean, clear story it tells. Boys and girls’ heights are about the same—girls a little taller in the tweens— until about age 13, at which point boys keep growing while the girls slow down. The left-hand graph has all the data, but it doesn’t tell the story as clearly as the right-hand graph.
Data-move reflection: When we moved
Gender left, we changed the grouping, and took advantage of the summarizing that was already in place.