CIVIC TECH ASSESSMENT RESOURCES 3: CONTENT AND SEMANTIC ANALYSIS

3. Content and Semantic Analysis

Content analysis is the systematic identification of the qualities of text. By analyzing the content of discussion threads, comments or posts you will be able to identify trends in content such as issues, key concepts, types of discourse, sentiment, the relationships between users to the content they produce, and patterns in user activity related to specific content. Content analysis can be done in two ways.

● Hand-Coding – which means reading text and assigning values or labels to parts or entire sections based on a coding guide that you design, either by hand or with the help of qualitative analysis software.

● Using Sematic Software – which means using software to analyze text and identify themes, issues and concepts and relationships between them.

Hand-coding messages can be extremely time-consuming if you have large amounts of data but will allow you to interpret the context of a message and to filter out information that may not be relevant to the analysis at-hand. For example, when examining user messages that refer to civic issues, you may be interested in the order or co-location of other issues (e.g., a post that begins about a lost dog, a neighborhood topic, and then shifts into a discussion about the lack of green space in a community, a civic issue.) Front Porch Forum has hypothesized that conversations on forums move from utility (find my dog) to building trust to organizing (crime issues) to controversial topics/issues over time. Community PlanIT is hand-coding 10,000 comments from one of their recent city-wide games and using content analysis to create a new code book for categorizing online civic engagement, which they hope will show that some online activity that is currently not seen as a form of engagement, might count as engagement in the future.

When it comes to semantic software, there are a full range of tools, from getting a quick look at the prominence of key words with a word cloud to designing software to filter your content. If you do need a more in-depth, specialized analysis, consider working with your developers to design a Natural Language Processing (NLP) algorithm that filters platform or social media data to meet your needs. For example, as part of its solution to manage the high volume of crisis texts they receive, DoSomething.org’s team developed an NLP tool that sends responses based on key words in the original messages.

Semantic analysis software can be especially useful when combining large data sets. For example, if you are interested in how closely forum conversations on civic issues correlate to local city council activity, or to state legislative activity over the course of a year, semantic analysis software can help map out the correlations between key words and concepts by searching through the forum message content alongside the city council minutes, or data on state legislative activity.

The image below is a visualization of an NLP analysis of the raw text of forum posts that shows the major concepts discussed by participants in a certain time period. This visualization focuses on forum activity around firearms, and by selecting the concept the raw text from forum user posts appears on the right-hand side, so that you can quickly see which users were engaged in the discussion and how:

This graphic representation of the results of an NLP analysis on forum posts was produced by an interactive console, now in beta, which was designed by the data visualization and software engineering firm Exaptive.

This map is an example of Natural Language Processing analysis of a list serv discussion on the topic of firearms. The colored nodes are people involved in the discussion. The grey nodes are the topics. The streamograph at the bottom shows the topic over time.

There are many tools that exist to assist with content analysis. You can analyze hand-coded content in a simple Excel document or with the help of more sophisticated, free tools such as Dedoose, which allows users to create a coding guide and quickly assign codes to areas of text, the similar online Coding Analysis Toolkit, or with complex social science software such as Atlas.ti.  An analysis of the Twitter conversations of all of the U.K.’s Members of Parliament in 2013 highlighted the top issues providing a look at what is being said, when and by whom.

If you have a large data set, you might prefer to use one of the increasingly available online programs for text analysis that range from very basic analysis like word-cloud visualization programs which gives greater prominence to words that appear more frequently, or intermediate programs such as Gephi, Tableau, Leximancer/Text Is Beautiful, DiscoverText or others noted in more detail here that perform more complex analysis and identify concepts and correlations between them in addition to basic frequency analysis.

Semantic analysis software can be especially useful if you are combining large data sets. For example, if you are interested in how closely forum conversations on civic issues over the course of a year correlate to local city council activity, or to state legislative activity, you could use semantic analysis software to map out the correlations between key words and concepts by searching through the forum message content alongside the city council minutes, or data on state legislative activity. Open Congress and Sunlight Labs both have a variety of downloadable data at the state level, and even some local data.