Twitter Trends Prediction Project

In Collaboration with

Guided by - Prof. Amitabha Bagchi and Prof. Maya Ramanath

Overview of the work

  • Data mining NEWS related tweets from huge twitter data.
  • Making the system robust and optimized to work with real time feeds of millions of tweets per day.
  • Creating social graph of twitter users using following relations.
  • Developing a model for predicting trends from non-trends, that is detecting trend in its very early stage.

Final presentation here

Final thesis here

A webtool to generate training data of 'good' and 'bad' tweets. NOTE : Accessible only from IIT network.

Link to the earlier work in minor project here.

Paper title Remarks
Time is of the Essence: Improving Recency Ranking Using Twitter DataA very relevant paper to both minor project and intern's work. The paper tries to improve ranking by using twitter feeds. Uses URLs in the tweet for discovering recent documents.Crawles URLs and uses several tweet and network features to rank tweets.
Recognizing Named Entities in Tweets Most of the NLP tools such as openNLP tool give very poor performance of ner or pos tagging in case of tweet data.Because the nature of tweet. Paper proposed a new method to recognize NER from tweet by a learning trained on Tweets.
Dynamic Relationship and Event DiscoveryTries to first identify potential buzzing entities in same time window as potential candidates.Then applies event discovery algorithms to extract events and learn relationships among entities dynamically.
Information Credibility on TwitterAn automatic sytem for measuring credibility of tweets. First filtered only newsworthy tweets and then supervise learning of features of a tweet to classify as credible or not.
Event Summarization Using TweetsBased on bursts of tweets of specific nature (Americal football in paper ) first identifies the events.Then based on frequent terms in that burst identifies sub-events(Hidden Markov Model).Then summarises the event at every burst.
Summarizing Sporting Events Using TwitterFirst identifies bursts of tweets then removes the noise(defined) and then uses phrase graph to generate sentences from the tweets bursts.Top sentences for bursts explains the moment with tweets burst.
Do All Birds Tweet the Same?Characterizing Twitter Around the World Measure reciprocal and hierarchical nature of twitter social network. Studies the happiness(defined) and interest groups behaviour for different demographics and makes some observations for different demographics.
Everyone’s an Influencer:Quantifying Influence on TwitterDefines 'influencing' by one's ability of make his/her follower repost or retweet same URL or tweet.Measures infulence of URL and people by the pattern of dissimation of information. Also predicts future influence of any URL or person.
Paper title Remarks
Information Credibility on Twitter Getting all tweets related to trending topics(Identify trending topics from twitter_monitor).Labels given by Mechanical Turk evaluators and theen results were used to conduct the supervised training phase
Streaming Trend Detection in Twitter Tf-idf frequency analysis for the word.Pre-processing filtering tweets less than 60 % english unigrams and stop words formation.
Topic Detection in Noisy Data Sources Using Latent Dirichlet Allocation(LDA) for the sentences instead of documents.Uses Stanford NLP for NER detectiona and then run LDA for sentences.Also making clusters(of topics) and checking accuracy.
An Efficient Clustering Algorithm for Microblogging Hot Topic Detection use tf*idf to find the important tokens as the features.We adapt the top N tf*idf score tokens as the feature of one tweet.Based on the features of tweets, we use Bayes classification to classify the tweets to different categories and droping valueless tweets..
Using Twitter to Recommend Real-Time Topical News Using both RSS feeds and Tweets to suggest news articles.Identify common terms,apply TF-IDF.Assign score to each article and choose top k.
Recognizing Named Entities in Tweets propose a novel NER system for tweets, which combines a KNN classifier with a CRF labeler under a semi-supervised learning framework.
Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation formalized the keyword life cycle leveraging a novel aging theory intended to mine terms that frequently occur.Also made topic-graph and user-social relationship graph to to detect emerging topics.
Emerging Topic Detection using Dictionary Learning Dictionary learning to identify topics(TF_IDF) and then cluster topics(K-means).
Paper title Remarks
Comparing Twitter and Traditional Media Using Topic Models
Large Scale Microblog Mining Using Distributed MB-LDA
Paper title Remarks
Detecting Trend and Bursty Keywords Using Characteristics of Twitter Stream Data
Trend Analysis of News Topics on Twitter
Measuring User Influence in Twitter: The Million Follower Fallacy
Detecting Twitter Trends in Real-Time
GeoWatch: Online detection of Geo-Correlated Information Trends In Social Networks
TwitterMonitor: Trend Detection over the Twitter Stream
Social Networking Trends and Dynamics Detection via a Cloud-based Framework Design
Trends Prediction Using Social Diffusion Models
Trend or No Trend: A Novel Nonparametric Method for Classifying Time Series
Following popular open source tools have been used.
  1. Entity detection (NLP tools) : Twitter_nlp, Open nlp, Stanford NLP tool,Open Calais,Key Graph
  2. Data colletion : Twitter API
  3. ML tools : WEKA,
  4. Graph storing : Neo4j
  5. Graph processing : py2neo, gremlin, cypher
  6. Graph Visualization : Gephi

Reports explaining earlier work done in the project

Master project work

The work done in Yahoo Summer internship.

The work done in Minor Project IIT Delhi.