Posts

Showing posts from March, 2020

Creating a village-level shapefile for Pakistan using a Voronoi Tessellation

Image
Using the latitude and longitude coordinates of human settlements in Pakistan, i create polygons that approximate the shape of villages.  I recently found a shapefile  hosted on the UN's Humanitarian Data Exchange (HUMDATA) portal which contains the coordinates of villages in Pakistan:  Village level shapefiles are extremely hard to come by, and most countries don't publicly release data at such a fine level of spatial disaggregation. However, conducting analysis at this level can drastically augment statistical power by increasing the number of observations. Consider the relative quantities of various administrative units in Pakistan: Districts are probably the most common sub-national administrative units of analysis in econometrics. In Pakistan, Tehsil and (sometimes) Union Council level data and shapefiles are available, allowing one to go below the district level. Even so, there are 46 times more villages than Union Councils and 1,696 more villages than Dis

Analyzing oil and gas firm responses to climate change using NLP

Image
How often do oil and gas companies talk about climate change? How do shareholders exert pressure on these issues? This project analyzes the text from 1,656 earnings calls-- over 16,000,000 words in total-- to provide answers. All available earnings calls for the 10 largest publicly traded oil and gas companies were concatenated into a dataframe. Speech was classified into three categories depending on whether the speaker was a shareholder, a firm representative speaking unprompted, or a firm representative answering a shareholder question. A list of climate-related keywords was then used to extract paragraphs discussing climate change. Finally, depending on the keyword(s) contained therein,  the sentiment of the paragraph is classified to reflect whether the speaker appears to accept/reject climate science, environmental regulations, carbon capture and storage, etc.  The output is visualized below. Blue indicates unprompted firm speech, light blue indicates firm responses, an