In conjunction with The IEEE International Conference on Data Mining series (ICDM 2015)
November 14th 2015, Atlantic City, New Jersey US.
Social media channels enjoy many advantages over traditional media channels, such as ubiquity, mobility, immediacy, and seamless communication in reporting, covering and sharing real-world events, e.g., the Boston bombings, the NBA finals, and the U.S Presidential elections. Given these advantages, social media posts such as tweets can typically reflect events as they happen, in real-time. Despite these benefits, social media channels also tend to be noisy, chaotic, and overwhelming. As a result, the vast amount of noisy social media data poses tremendous challenges for conducting in-depth analysis, which is critical to applications for event playback, journalistic investigation, storytelling, etc. The purpose of this workshop is to bring together researchers that are working in a variety of areas that are all related to the larger problem of analyzing and understanding events using social media responses, to discuss: 1) what are the recently developed machine learning and data mining techniques that can be leveraged to address challenges in analyzing events using social media data, and 2) from challenges in analyzing events, what are the practical research directions in the machine learning and data mining community.
Topics of Interest
We encourage submissions on a variety of topics, including but not limited to:
EASM 2015 solicits regular technical papers of up to 6 pages following the IEEE author guidelines as well as short papers of up to 2 pages. Regular papers will be presented in an oral session. Short papers will be presented in a demo or poster session. Submissions must be original and not submitted to or accepted by any other conference or journal. All submissions will be peer-reviewed by at least two Program Committee members. The review process will be double-blind. Therefore, authors must conceal their identity (no author names, no affiliations, no acknowledgment of sponsors, no direct references to previous work).
Please submit your contribution via the submission system
Yuheng Hu Email is an assistant professor at University of Illinois at Chicago. Yuheng works at the interface of Social Computing, HCI and Machine Learning. His research focuses on developing algorithms, tools and systems to characterize, make sense of, and predict people's reactions on social media in response to different real world events. His work has been published at various highly reputed conferences including AAAI, IJCAI, ICWSM and CHI, where he won a best paper nomination in 2013. His work has also been featured in press outlets such as ABC, PBS, The Seattle Times, and FastCompany.
Yu-Ru Lin Email is an assistant professor at the School of Information Sciences, University of Pittsburgh. Her research interests include human mobility, social and political network dynamics, and computational social science. She has developed computational approaches for mining and visualizing large-scale, time-varying, heterogeneous, multi-relational, and semi-structured data. Her current research focuses on extracting system-level features from big data sets, including social media data and anonymized cellphone records, for studying human and social dynamics, particularly under exogenous events such as emergencies and media events. Her work has appeared in prestigious scientific venues including WWW, SIGKDD, InfoVis, ACM TKDD, ACM TOMCCAP, IEEEP and PLoS ONE.
Subbarao Kambhampati Email is a professor of Computer Science at Arizona State University, and the President-Elect of Association for the Advancement of Artificial Intelligence (AAAI). He has significant research interests in information integration and social media, and has advised several graduate students in these areas. His research in these areas has been supported by ONR and three Google research awards. He was a program co-chair for AAAI in 2005, ran the AI and Web track of AAAI in 2010, area chair for AI & Web for IJCAI 2015 and will be the program chair for IJCAI 2016. .
|8:30 - 8:35||Welcome Message |
Yuheng Hu and Yu-Ru Lin, Co-chairs
|8:35 - 9:20||Keynote Talk |
Dr. Mor Naaman, Cornell University
The Future of Systems for Current Events
Abstract: An overwhelming amount of content from real-world events is shared by individuals through social media services. These events range from major global events like an uprising or an earthquake, to local events and emergencies such as a fire or a parade; from international media events like the Oscar's, to events that enjoy little media coverage such as a conference or a music concert. This shared media represents an important part of our society, culture and history. At the same time, this social media event content is currently fragmented across services, hard to find, and often difficult to consume due to its sheer scale. We have worked since 2008, in both research and commercial (startup) settings, to tackle these (and other) challenges in making social media information about events accessible and usable. I will discuss our early research, show how it led to the startup company I co-founded, comment on what the startup (which recently pivoted away from events) did well and where it failed, and highlight open challenges and directions for the future work and research in this area.
|9:20 - 9:40||Correlation of Brand Mentions in Social Media and Web Searching Before and After Real Life Events : Phase Analysis of Social Media and Search Data for Super Bowl 2015 Commercials|
Partha Mukherjee (Pennsylvania State University, USA) and Bernard Jansen (Qatar Computing Research Institute, Qatar)
Abstract: The integration of social media technologies with second screen devices during thebroadcasts of in-real-life events facilitates a mode of online conversation we refer to as the social soundtrack. In this research, we compute the correlations between the comments people post in the social soundtrack on various platforms (i.e., Twitter, Instagram and Tumblr) and the terms people search for on a major web search engine (i.e., Google). The broadcast media event for this research is Super Bowl 2015 commercials. Using statistical t-tests, we compare the correlations between the relative volume of searching, obtained via Google Trends, and the relative volume of social soundtrack postings on each of three social media platforms for two temporal phases (Pre and Post) for Super Bowl 2015. We exclude the game day from our research due to insufficiency of granularity for search data on the game day. Research results show that there is no overall significant difference in phase correlation between social media and search data. However, at the individual level, there are brands that do show significant correlation between phases. The number of significant positive correlations between the social soundtrack postings and web search concerning brands are considerably high compared to the number of significant negative correlations in both phases. The research results are important in identifying the temporal trends and interplay between type of social media platforms and searching concerning the sharing of brand mentions in word-of-mouth marketing. The result will eventually help retailers focusing on the brands with higher correlations to lever the opportunity of electronic word of mouth advertising.
|9:40 - 10:00||Geolocated Twitter Panels to Reduce Selection Bias for Event Studies|
Han Zhang (Princeton), David Rothschild (MSR NYC), Shawndra Hill (MSR NYC)
Abstract: Data from Twitter have been employed in prior research tostudy the impact of events. Historically, researchers haverelied on keyword driven samples of tweets to create a crosssectionof the Twitter conversation about specific events bysearching for event-related keywords during and after theevents. There are several limitations to the keyword-basedapproach. First, the technique suffers from selection biassince users who discuss the event are already more likelyto discuss event-related topics beforehand; it is not clearwhether observed impacts are merely driven by the set ofusers who are intrinsically more interested in events. Second,within-subject analysis is not feasible with a keywordselected cross-sectional approach. Third, there are no viablecomparison groups to a keyword selected cross-sectionsample. We propose an alternative approach to study theresponse to events on Twitter that addresses the aforementionedissues. We construct panels of users defined by theirdemographics, including geolocation. These panels are exogenousto the keywords in their tweets, which results in lessselection bias than the keyword cross-section. GeolocatedTwitter panels allow us to follow within person changes overtime and enables the creation of comparison groups. We testour approach in two real-world settings: response to TV advertisingand response to mass shootings. We illustrate howour approach limits the selection bias introduced by the keywordselection approach, how the same set of users shiftdiscussion before and after an event, and how geographycan provide meaningful comparison groups for the impactof these events. We believe that we are the first to provide aclear empirical example of both how a better sample designreduces selection bias compared to the current state of theart and increases the value of Twitter research for studyingevents.
|10:00 - 10:15||Coffee break|
|10:15 - 10:35||A Human-Machine Collaborative System for Identifying Rumors on Twitter|
Soroush Vosoughi and Deb Roy (MIT)
Abstract: The spread of rumors on social media, especially in time-sensitive situations such as real-world emergencies, can have harmful effects on individuals and society. In this work, we developed a human-machine collaborative system on Twitter for fast identification of rumors about real-world events. The system reduces the amount of information that users have to sift through in order to identify rumors about real-world events by several orders of magnitude.
|10:35 - 10:55||Newsworthy Rumor events: A Case Study of Twitter|
Armineh Nourbakhsh(Thomson Reuters), Xiaomo Liu(Thomson Reuters), Sameena Shah(Thomson Reuters), Rui Fang(Thomson Reuters), Mohammad Ghassemi (MIT), and Quanzhi Li (Thomson Reuters)
Abstract: Rumors differ in how and where they originate, what topics they address, the emotions they invoke, and how they engage their audience. In this study, we analyze various semantic aspects of rumors and inspect their origination and propagation patterns. Using Twitter as a case study, we develop a framework to characterize rumors. Our characterization covers intrinsic and extrinsic factors, tweet and event-level, as well as usage analysis. We determine the roles various user-types play and analyze rumor propagation from both a re-tweeting and burstiness perspective.
|10:55 - 11:15||Outlier Detection and Trend Detection: Two Sides of the Same Coin|
Erich Schubert, Michael Weiler, and Arthur Zimek (Ludwig-Maximilians-Universität München, Germany)
Abstract: Outlier detection is commonly defined as the process of finding unusual, rare observations in a large data set, without priorknowledge of which objects to look for. Trend detectionis the task of finding some unexpected change in some quantity,such as the occurrence of certain topics in a textual data stream.Many established outlier detection methods are designed tosearch for low-density objects in a static data set of vectors in Euclideanspace. For trend detection, high volume events are of interestand the data set is constantly changing. These two problems appearto be very different at first. However, they also have obvious similarities.For example, trends and outliers likewise are supposed to be rare occurrences.In this paper, we discuss the close relationship ofthese tasks. We call to action to investigate this further,to carry over insights, ideas, and algorithms from one domain to the other.
|11:15 - 11:35||Event Detection from Millions of Tweets related to the Great East Japan Earthquake using Feature Selection Technique|
Takako Hashimoto (Chiba University of Commerce, Japan), Dave Shepard (UCLA), Tetsuji Kuboyama (Gakushuin University, Japan), and Kilho Shin (University of Hyogo, Japan)
Abstract: Social media offers a wealth of insight into howsignificant events̶such as the Great East Japan Earthquake,the Arab Spring, and the Boston Bombing̶affect individuals.The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-basedevent-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data dueto their space and time complexity. To alleviate this problem, wepropose an efficient method for event detection by leveraging afast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of featuresto track. We then convert these word vectors into a time seriesof vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makesit possible to detect events from vast datasets. To demonstrateour method’s effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.
|11:35 - 11:55||Fast Community Discovery and Its Evolution Tracking in Time-evolving Social Networks|
Yao Liu, Hong Gao, Xiaohui Kang, Qiao Liu, and Zhiguang Qin (University of Electronic Science and Technology of China, China)
Abstract: In real world, social networks are large scale, noisy and evolutionary. Communities are inherent characteristics of human interaction in social networks. Tracking evolutionary communities in dynamic social networks has become an increasingly important research topic. Several classic incremental clustering and evolutionary clustering algorithms have been proposed. But they all face a problem of controlling the balance between running time and clustering quality. In this paper, we propose a fast incremental community evolution tracking (FICET) framework to discover community and track community evolution in slowly and highly evolving networks. For higher clustering quality, this framework identifies community not only by the current network data but also by the prior community structures. For shorter running time, this framework uses subgraph-by-subgraph incremental method, and introduces core sub-graph to infer the core community. Through the introduction of core sub-graph, we can quickly capture the community evolutionary events including forming, dissolving and evolving and so on. Experiments on a series of synthetic datasets and real-world datasets demonstrate that this framework improves both the clustering quality and the time performance when compared with the state-of-the-art frameworks.
|11:55 - 12:15||Toponym Recognition in Social Media for Estimating the Location of Events|
Meryem Sagcan and Pinar Karagoz, (Middle East Technical University,Turkey)
Abstract: Prominence of social media such as Twitter and Facebook led to a hugecollection of data over which event detection provides useful results. An important dimensionof event detection is location estimation for detected events. Social media provides a varietyof clues for location, such as geographical annotation from smart devices, location field inthe user profile and the content of the message. Among these clues, message contentneeds more effort for processing, yet it is generally more informative. In this paper, we focuson extraction of location names, i.e., toponym recognition, from social media messages. Wepropose a hybrid system, which uses both rule based and machine learning basedtechniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used asthe machine learning tool and features such as PartofSpeech tags and conjunction windoware defined in order to construct a CRF model for toponym recognition. In the rule basedpart, regular expressions are used in order to define some of the toponym recognitionpatterns as well as to provide a simple level of normalization in order to handle theinformality in the text. Experimental results show that the proposed method has highertoponym recognition ratio in comparison to the previous studies.
|12:15 - 12:35||Automatic Visual Analysis of Real-World Events Covered By Social Media Using Convolutional Neural Networks|
Henning Hamer, Andreas Merentitis, Nikolaos Frangiadakis, and Sergey Sukhanov (AGT Group (R&D) GmbH, Germany)
Abstract: This paper investigates how well real-world events can be characterized by visual features detected in related images posted on social media, using state-of-the art computer vision methods for object detection and classification. Over 48k images from four different events have been processed to detect objects of different types using convolutional neural networks (CNNs) and cascaded classifiers. Based on these object detections we train different classifiers to rank object types supporting the respective event and to discriminate images of an event from other images. Possible applications include 1) finding images of a certain event in a semi-automatic way, and 2) classifying the type of an event.
|12:35 - 12:40||Discussion and Closing Remarks
Yuheng Hu and Yu-Ru Lin, Co-chairs
Camera ready & Registration
1. About Camera-Ready (1) The camera-ready papers should be submitted no later than Sep. 10. (2) The "Author's Final Paper Formatting and Submission Instructions" Webpage (Online Author Kit) can be found at http://www.ieeeconfpublishing.org/cpir/authorKit.asp?Facility=CPS_Dec&ERoom=ICDMW+2015 . (3) The paper length limit for workshop camera-ready papers is 8 pages maximum. Papers with more than 8 pages will be charged for extra pages.
2. About Registration (1) When you register for your paper(s), please input your paper ID so that we can check your registration status. Only registered papers will be included in the workshop proceeding. (2) Each paper will get two paper IDs. One is got when authors submit the paper for reviewing (i.e. Submission ID), and the other is got when they upload the camera-ready (Camera-Ready ID). You should use the Submission ID for registration.
3. About Workshop Schedule (1) The workshops will be held in the whole day of Nov. 14. (2) The morning session is 8:30-12:30, and the afternoon session is 14:00-18:00.