After successfully making it through my first year on the academic tenure track, I realized two things: (1) organiznation is key, and (2) I need to get back to this blog!
I had high hopes for what I could accomplish here at housingthecity.com last year and I got in my own way by not making it a part of my strategic plan. No more! We are going to kick this summer semester off right with a series on a subject near and dear to my nerdy little heart: data science!
What’s data science?
Data science is one of the new buzzwords being whispered in the darkest corners of academic locker rooms across the world.
Academic locker rooms are a thing….trust me 😉
Together with Big Data, the phrase ‘data science’ conjures up images of unkempt, slouched geeks tapping away at a keyboard, with a screen looking like a scene from the Matrix.
<- cool computer bro
The reality is much simpler. Data science is the study of computer assisted methods to collect, store, and analyze large data sets. By computer assisted, I mean that the data analyzed in data science problems requires a researcher to understand how computers process and analyze information, in order to get the most meaning from the data. This is because the data in data science problems is usually unstructured – in other words, it’s not produced with research in mind.
Where does unstructured data come from? The information age [think 1970 to present] has created hundreds of thousands of new devices: those devices produce data. Most of the data is about how humans live their lives: cell phones, activity trackers, social media, websites – things that are part of our everyday life now that weren’t even 15 years ago. Data science helps us use this data to answer social science questions.
For example, say you noticed a hashtag trending on Twitter that relates to your research interests. Ideally, you’d like to collect those tweets, store them securely, and analyze them to answer some research question. But how do you get the data? There are over 100k tweets using this hashtag – how to you store it all? Also, tweets have super complicated content – emojis, urls, videos, pics, not to mention a whole lot of abbreviated words because of the 140 character limit: what’s the best way to analyze all this stuff?
Answer: data science!
You had me at data! Let’s get analyzing!
Remember that data science is about collecting, storing and analyzing unstructured data. Before you can use data science as part of your research, you need to learn 5 things:
5 Basic Data Science Skills
- How to code in the Python and R computer languages
- How to store large amounts of data
- How to process a lot of data (ex. parallel processing, map reduce, batch processing)
- Methods to analyze large amounts of data (ex. machine learning)
- How to visualize your results (imagine using a table to summarize 100k tweets?! Eek!)
Every two weeks, I’ll post a new article discussing each of the five skills (don’t worry, I have an alert set!).
I’m off to nerd out at a computational social science institute. In the meantime, stay curious!