**Title**

Gathering Data

**Description
**

This lesson is the third of a six-lesson unit to help students become more familiar with data journalism. This lesson follows the “Data in Scholastic and Professional Journalism” lesson and introduces students to main sources of big data on the web. Students will also explore web-based tools for gathering data.

**Objectives**

- Students will identify sources of big data on the web.
- Students will use free web tools to gather web data that is not already in a table format.
- Students will create their own “how to” guides for using data scraping tools.

**Common Core State Standards**

CCSS.ELA-LITERACY.RST.9-10.7 | Translate quantitative or technical information expressed in words in a text into visual form (e.g., a table or chart) and translate information expressed visually or mathematically (e.g., in an equation) into words. |

CCSS.ELA-LITERACY.RI.11-12.7 | Integrate and evaluate multiple sources of information presented in different media or formats (e.g., visually, quantitatively) as well as in words in order to address a question or solve a problem. |

CCSS.ELA-LITERACY.RST.11-12.9 | Synthesize information from a range of sources (e.g., texts, experiments, simulations) into a coherent understanding of a process, phenomenon or concept, resolving conflicting information when possible. |

Length

At least two 60-minute class periods

**Materials / resources**

Slideshow: Big Data on the Web

Computer access

Tabula software, downloaded in advance (or loaded on at least one computer to practice)

Data set: AFL-CIO spreadsheet (for Tabula practice)

Import.io for web scraping practice

**Lesson step-by-step**

*Important note: *This lesson requires access to Tabula, a free online scraping program that must be downloaded prior to use. Please allow extra time to make sure you can download the software and have it available for student use on the computers you plan to access. If that is not possible, you can use a teacher’s computer to demonstrate Tabula for the rest of the class, and then have students practice as time allows. You will need to spend some time familiarizing yourself with both Tabula and Import.io BEFORE this lesson.

*1. Introducing big data* *— 50-60 minutes*

Use the Big Data on the Web slideshow to introduce your students to the idea of big data — essentially, examples of the kinds of databases used to create the stories students looked at in the previous lesson. If your students need reinforcement on how to access records or databases, start with a Using databases lesson first. You’ll want students to be able to try their own Google searches at computer stations but also have good visibility to see the slideshow as you present. If that’s not possible, review the slideshow first, and then go back to the slides that ask students to test out their search skills.

**Pause** at Slide 8 and allow students to complete some advanced Google searches. This should take about 10 minutes.

**Pause** at Slide 11 to distribute the Open Records worksheet. Allow students 20-30 minutes to complete.

Once students have completed the exercise, continue to Slide 12.

*2. Scraping data — 60-90 minutes *

**Note: **This part of the lesson requires students to learn both Tabula and import.io. It is imperative that the teacher have some knowledge of these programs. Allow yourself ample time prior to this lesson to become familiar with how these products work. Both have “help” and tutorial pages along with FAQs online.

Continue with the slideshow on Slide 12. Follow the teacher notes and talking points in the slideshow.

**Pause** for 20-30 minutes at Slide 16 for students to try Tabula and take notes on how it works.

**Pause** for 20-30 minutes at Slide 17 for students to try Import.io and take notes on how it works.

*3. Assessment — 60 minutes*

Students will create a “how to” for both Tabula and Import.io. Use the assignment sheet to discuss the project with students, then allow them class time to complete. You’ll notice there are two options for the assignment based on students’ skill level. Those familiar with media technology (and if you have QuickTime on your computers) should be challenged to use the QuickTime screen capture function to record them using and narrating the tools. (Here is a tutorial for info on the QuickTime screen capture tool.)

**Differentiation**

Advanced students should use the QuickTime screen capture function for their assessment. Advanced students could also create their “how to” using InDesign to give them practice with layout features.

Students who require additional support should be given a copy of the slides to follow along with the lecture. When the class completes the open records guide, students who need additional support or who are not ready to evaluate open records can continue practicing their advanced Google searches by looking for information that interests them in PDF and Excel spreadsheet format. When working with data, it is often necessary to break out the different skills and allow students who are struggling to embrace just one part of it more fully (such as advanced searching). Another option is to assign these students to search the school district website and identify examples of data they see that might be of interest to the class.