Learn how data science can help derive insight from social media content by first defining different types of social media data and their nature such as raw data vs. meta data. Jungwoo will explain how various data science technologies such as text mining and parsing through an API are applied to derive insight from the social media data.
- [Voiceover] More and more people are using social media. This in turn generates an enormous amount of data. Data scientists are naturally attracted to these new emerging types of data sets. Social media refers to websites where users can post their own content to share it with their friends and beyond. Depending on their focus, social media sites have different types of interests they promote. For example, Facebook offers a forum for building informal and personal relationships, compared to a professional networking tool like LinkedIn.
In addition to its size qualifying as big data, another unique value of social media data lies in the data about data, or metadata, it carries. For example, a post on Facebook can accompany location information as well as timestamps. With these kind of unstructured but very rich data sets, a lot of useful insight can be derived about a person who is posting and consuming information.
For example, IBM has a product called Personality Insights which offers a profiling service for companies that would like to know more about their customers. In the case of social media analytics, text mining and parsing are the very important and necessary first step. Social media companies often make their content available through their application programming interface, or API.
Using this API, data scientists can retrieve the data they want. Collecting the social media data is one thing, but manipulating it for analysis purposes is another. A lot of skills and efforts are necessary before attempting to apply analytics methods, although standards like JSON helps.
Jungwoo Ryoo is a professor of information science and technology at Penn State. Here he reviews the history of data science and analytics, explores which markets are using big data the most, and reveals the five main skills areas: data mining, machine learning, natural language processing (NLP), statistics, and visualization. This leads to a discussion of the five biggest career opportunities, the four leading industry-recognized certifications available, and the most exciting emerging technologies. Along the way, Jungwoo discusses the importance of ethics and professional development, and provides pointers to online resources for learning more.
- A history of data science
- Why analytics is important
- How data science is used in social media, climate research, and more
- Data science skills
- Data science certifications
- The future of big data