Lab 2 Report

Exploring HathiTrust Dataset and Trans-Atlantic Slave Trade Dataset

HathiTrust Dataset

HathiTrust Dataset accommodates an extensive range of metadata and data in each metadata, and some of them are indecipherable to me. The dataset seems to be meticulously constructed, but it does not allow the researchers to easily use and analyze its data. The long list of metadata confused me to decide where to start. Considering a while and having trials and errors, I began with the imprint date and geographics. Fortunately, familiar with the Microsoft Excel program, I could manage datasets and data much better than when I had worked with Github. In Pivot Table 4, I created a dataset with imprint date metadata, and it displays that after 1946, the number of imprinted materials drastically increased. Through this, I hypothesized that this phenomenon might be associated with the end of World War II, which allowed people to be more relaxed and contribute to reading more than they had done during the War. Here, I wondered how much data about War were imprinted after 1946, so I created another dataset with imprint date and genre metadata, filtering all unrelated data. As the dataset in Pivot Table 5 shows, the amount of data about war is not high. It does not expound on result, and it does not mean that the war genre has not been dealt with nor produced. The more sophisticated survey with more metadata or data is necessary to research it. What I learned here is that data does not explain all.

Then, I constructed another dataset with geographies and imprint date metadata. I wondered how much data about the books imprinted in Asian countries—China, Japan, and Korea—from 1980 would be displayed. After setting geographics and imprint dates at rows, filtered unnecessary data, I output the data. With the same metadata, I created two Pivot Tables, Table 6 and 7, and the difference between the two is which metadata is the center, either geographics or imprint date. Even though the two Pivot Tables were constructed with the same metadata, both output data and the charts tell different stories. In Pivot Table 6, how many data could be got according to the regions or countries; in Pivot Table 7, the data is arranged according to the imprint date. However, some data is unorganized, so they are invalid to be used and analyzed. For instance, “China, Shanghai (China),” “Nanjing (Jiangsu Shen, China)” should be labeled under “China”; “Ginza (Tokyo, Japan)” should be under “Japan.” Therefore, I wonder whether the dataset and the whole data are valid for further research. However, these two data and the visualized charts in Pivot Tables show that the data can be presented differently who designs it—what do they want to show you or what they want you to see—as Catherine D’Ignazio and Lauren F. Klein talk about the visualized data.

Trans-Atlantic Slave Trade Dataset

Compared to HathiTrust Dataset, Trans-Atlantic Slave Trade Dataset is scrupulously and neatly organized, and its metadata enables the researchers to navigate and analyze the traces of the Black’s diaspora. I got several data from this dataset and could trace the slaves’ tragic journey. In the Copy of Pivot Table 9, for example, I selected “Outome of voyage for slaves,” “Year of arrival at port of disembarkation,” and “Flag” in order to trace back more clearly when and how the slaves traveled and its outcome. One data says that in 1656, one slave disembarked in Old World and this slave was transferred the ship under the flag of Great Britain. The more data, the more slave narrative would be produced. Therefore, the Trans-Atlantic Slave Trade dataset seems to be an effective tool “to render black people as human” (Jessica Marie Johnson, 65).

However, what makes me uncomfortable is that data for “African resistance” and “Slaves died during middle passage” and “Mortality rate.” First, from the metadata, “African resistance” contains the data about what the slaves had done to resist the slave hunters and traders. However, there is no data for causes of resistance. That is, from this data, we cannot see the slaves’ perspective on the resistance. It implies that the metadata is structured and recorded by non-black people. Catherine D’Ignazio and Lauren F. Klein say that “Lurking under the surface of so many classification systems are false binaries and implied hierarchies, such as the artificial distinctions between men and women, reason and emotion, nature and culture, and body and world” in “What Gets Counted Counts.” In other words, the classification system for the data looks neutral, but it is designed by “the matrix of domination” (of Western white males) and follows their ideology. Furthermore, although many metadata provide sufficient data to research and scrutinize how the slaves were transported and traded, we cannot get enough data for “Slaves died during middle passage” and “Mortality rate.” Of course, there might be some explanations for this. No or less records. Nonetheless, the insufficient data means that still the classification system designers are members of the dominant group, and more efforts are required to see the slaves or black people as human.