Chapter 6 A Second Assignment
Think about what we did with the investigation of gender and height. The original graph showed that the males were taller than the females. But that graph was bogus because it didn’t take age into account.
The point is that the simple, straightforward approach was too simple. To get a better story out of the data—a story that more fairly describes what’s really going on— we had to dig deeper.
In this project, you can look again at the 2013 California data. Instead of stopping at the first, simplest conclusion or claim, you will dig deeper and add some nuance to the investigation. (You will also have an option to look at a different data set; stay tuned.)
6.1 How do you know when you’re done?
You have a Google Doc (or whatever format your instructor requires). It contains:
- Your name.
- A simple claim. Pick one:
- People with more education make more money than people with less education.
- People who get on BART at SFO generally get off at Powell Street.
- A graph with the first-look, obvious results.
- A description of why that simple approach might not be enough to really explain the data
- A description of what you did to further the investigation (“dig deeper”). In this description,
- You have grouped the data by dragging an attribute (or attributes) to the left in the table.
- You have calculated some summary value (e.g., a mean or a sum) for each group by making a new column with a formula.
- The results of that deeper investigation (with, e.g., a new graph of the data)
- A conclusion: is the story any different now?
- Possibly, ideas for additional “dig deeper” activities with this data set, and…
- A link to a shared CODAP doc (like last time) so the instructor cam see what you did.
Your Google Doc is probably no more than two pages long. Be sure to set permissions so your instructor has edit access.
Where to get the data
Instead of using a “canned” data set, for this assignment you will use a data portal. In CODAP, this appears as a window where you specify what data you want, and then press a button to get it.
- Here is a link to a document with the BART data portal installed … BART is a regional transit system in the San Francisco Bay Area. SFO is the San Francisco International Airport. You can read all about the BART data portal in a separate chapter.
- ACS Census data for the education-and-income investigation … You can get data on education, income, race, gender, etc., and choose how many cases. Go to the options tab to choose what attributes you want.
ACS stands for “American Community Survey,” which is run by the Census Bureau, and collects data between the decennial Census years. The portal for the Census data looks like this:
Google Doc. If you do not know yet how to make a Google Doc, it’s time you learned. If you need to know more, ask a friend or Google it.
Including a Graph. How do you get the graph from CODAP into your file? At the moment (early 2020), you can’t just copy and paste. Here are two alternatives:
- Click the camera palette in the graph and…
- choose Export Image (you can also Open in Draw Tool).
- choose Local File and pick where you want to save the file.
- after saving the file, import it (it’s in .png format) or do a copy/paste into your Doc.
- Use a screen-capture utility to get an image of your graph, and then paste it into your doc.
Pro tip: Be conscious of space. Noobs often just paste huge graphs into their documents and leave them that way. After pasting, shrink the graph so that it’s a reasonable size. What’s reasonable? When you print it out, it should still be easy to read any text. For a typical CODAP graph, that’s no bigger than about 1/6 of a page. If you know how to wrap text around a graphic, sometimes that can look very professional.
6.2 Example: Gender and Income
Here’s an example of the kind of thing we have in mind:
Suppose we were interested in gender and income. The simple approach is to (duh) plot gender and income. The graph alone looks vaguely like the men get more, but if you put the median on the graph, and rescale it, it’s really obvious:
This is what we probably expect: men earn more than women. But does that tell the whole story? How could we dig deeper?
We might ask:
- Is it possible that incomes really are equal, that we’re looking at it wrong?
- Can we be more nuanced? For example, is there some other factor that affects income?
Looking at the graph, see the large number of people who seem to earn zero—or close to it? That spike at zero is taller for women. Maybe that’s because more women work in the home, and are not paid.
So maybe the incomes for people with jobs are equal between men and women, but because more women do not get paid, their median income is lower overall. This reasoning is an example of exploring whether we’re looking at it wrong, and that there is another factor—employment—that affects income. That is, it’s not just gender.
To test this idea, let’s just look at people with jobs. It turns out that we can select an attribute (a column) called EmplStatus for “employment status.”
That means you could filter to focus your investigation on people with jobs. That way, you can explore whether men with jobs generally earn more than women with jobs.
Try that in the live illustration below. The “employment status” attribute is at the far right of the table.
You should find that the men still earn more. If you hide everyone but those that are “Civilian employed,” the men earn$45,000 to the women’s $30,000. So the fact that more women do unpaid work does not completely explain the difference in income.
Notice that this will only work if you have downloaded EmplStatus data. If you get partway through your investigation and realize that you wish you had downloaded something else, or something more, you can’t add additional columns to the cases you already have. But remember: starting over is free. Just go back and get fresh data with the attributes you want.
6.3 Don’t forget to drag left!
We’ve seen what we mean by “dig deeper,”
but don’t forget to read the assignment.
It also expects you to do that “drag left” grouping move.
In this case, that would be dragging
Gender left to make
groups of males and females.
Then you would make a new column (maybe called
in which you would calculate the median income for each gender.
Look back at the section where we made groups by age if you don’t remember how.
Then think about how you can use grouping to help with your “dig deeper” work.
When you drag
Gender to the axis of a graph, CODAP helps you do a grouping data move. Then, when you put the median on the graph, that’s summarizing. For this assignment, though, we want you to make those moves explicitly—grouping by dragging left, summarizing in a new column with a formula— because (a) it’s good practice and (b) it’s more flexible.
6.4 “Digging Deeper” and skepticism
“Digging deeper” means adding nuance to your investigation, bringing out important trends and effects that might not be obvious at first. It’s like fleshing out a sketch and making it more detailed.
But it’s also about being skeptical. When you make an initial, obvious graph, and draw some conclusion or make some claim, a skeptic steps back and wonders whether that’s really correct. Is there some other explanation for what you are seeing?
You can think of this as a “yes, but” attitude. It’s also the role of a “devil’s advocate”—someone whose job is to tear down an argument. If the devil’s advocate succeeds, that’s good, because without that skepticism, the conclusion would have been wrong. And if the devil’s advocate fails, that’s good too, because by passing that test, the original conclusion is stronger.
Skepticism is not always confrontational, however. You can also think of it as a “yes, and” attitude: the claim may be right as far as it goes, but there may be other important considerations.
In the case of gender and height, we saw that, indeed, males were generally taller than females, but only after about age 13.