For this project, I started with the transcripts of the first six GOP debates (including moderators) and split the statements by speaker and debate occurrence. I then make use of a statistical test to learn the significant phrases among the statements, and calculated an overall "strength of signal" index based on the overall information gain of the phrase relative to the entire debate performance.
From this, I was able to present the significant phrases grouped by speaker and rank the phrase's overall importance to the candidate's message. Selecting a phrase from the bubble chart will provide the specific context of the phrase by the speaker for selected phrases, and calculate a histogram displaying the overall sum of signal strength per candidate.
The greater number of candidates in the GOP debates provided a great starting point to experiment with data mining of political debates; stay tuned for a similar analysis of the Democratic debates, and for further topical analysis of the debate performances. There are a lot of overlaps of phrases ("radical_islamic" and "islamic_terrorism", for example); grouping phrases by topic should help identify similarity of message between candidates.