Computational Approaches to Analysing Weblogs

Wednesday, March 29, 2006

Symposium complete


Well - these were fun and informative three days! So, now that we are the best informed group of people on matters bloggy and bloglike - how can we best find ways to further our collective insights and competence?




Meteorological note: record precipitation this March. We should print a T-shirt.
Rain rain rain
(Pic from flickr.)



/ Jussi

Monday, March 27, 2006

Panel Discussion - Day 1

Check out an additional summary of the panel discussion (Day 1 - March 27) at EBB.

Computational Approaches to Analysing Weblogs

It seems to me that an important issue that has barely been touched on is the fact that a number of studies have shown that bloggers tend to organize themselves into smallish communities of like-minded individuals, reducing (or devaluing) interactions with people with different opinions. How does this affect the evolution of opinions in the blogosphere? Do they tend to get more extreme/polarized? Should we not make a priority for research the exploration of how the opinions of bloggers relate to those of non-bloggers? Not all people with potential access to the blogosphere blog, so this would seem to be a key question.

Also, let's try to sign posts here - "sourced intelligence", right? :-)

Shlomo Argamon

Semantic web

microformats, carrying data blogs are a good testing ground for blogs.

last question - what problesm will the research community need to address

Chris - growth of amount of content and dynamic nature of content, metadata not very much at this stage. Each panel member commets on challenges and opportunities:

Carrie - web search and other types of search. When people looking for stuff can we really understand who the customer is and what blogs are they interested in. How can we do this in not only a qual way. For example, these are my interests, blog sites recommendations on doctors, restaurants etc. 9 million livejournal users hardest problem how do i discover the community, find teh ones right for me. Relevancy - dont know from a marketing perspective if we will fit in the alt.rock box. Diversity of info how do we get better tools to regulate content. Flickr, delic i ous metrics dont really exist to evaluate community. World is flat, email cell phone virtualisation, hoteling having a group employee blog is a great way to mine a group of loyal comany individuals. A ton of information.

Better info about feed, video,podcasts now as a consumer can create. There isn't anything we cat do with langauge processing we are limited by processing, too complicated if we can simplify and speed up the process. Need to push envelopes better, accuracy and speed.

Great moderation from Mark Liberman.

future predictions contd.

Closer interaction between blogging and social network will become integrated kinda of like yahoo 360. I will be dictate who willl see what.

5 year prediction beyond social and legal

What will expereines be like 5 years from now ? Bifurication between bloggers that want to be public and those that don't. Split will be amplified. The media will morph again and again. sHIFT OF POWER how do we avoid the closing off.

Panel hard at work

Blog symposium panel
From flickr.

Uses and future of consumer

People realise that info is actually public may lead to a backlash from consumers if the usage is abused. Defintion of blogs is they want the world to see the information. Recognition that my opinion is worth something. How do i get to share in this. Interesting frontier to see how consumers will want a quid pro quo.

Panel - What industry needs from researchers & what will consumers be doing with blog data

Consumer intelligence qual/quant mix. Number of verticals depend on blogs e.g. video gaming. 5 years from now this data will be currency just like IRI data or Nielsen. Wait how sales volumes are going, opinion scores these kinds of info will be a currency 5 years from now will want to know what will be consumer generated rating.

Tony speaks to qual side

90% on his media site comes from users. People spend 9 mins on average on alway on. Not religious as a movement that represents the power of the voice. Always on is a multiple voice where members enjoy each others post. Paul from Intel Going-On is a content management system with built in social network. Products,music,life decisions etc. traditional marketeers have no penetration. IM generation puts higher value on blog content.

Moving on blog analysis cf other methods

Quant/qual or both.

High level when you are looking at market research so looking at end goal is important. Focus groups are biased. Much faster response rate so large clients want this info immediately as shelf life diminishes rapidly e.g. on new products. Look at this info as word of mouth so bubbling effect so best info gets picked up most often. Trendcaster look at data on hydrocarbon but by looking at consumer generated data (unaided or less aided). certain metrics e.g. advocacy will give direction but much more quantitative that can be layered with qual. Is this being picked up in mainstream. Combination of qual and quant. Correlate both gives interesting info. Cameron doesnt work with blog search at Yahoo focus on customer requirements is about understanding context. Understanding of community modelling.

Errors

My head hurts - not sure where this discussion on errors are going so i suspect it will finish soon. Six Apart got some laughs . Create blogs from high value words and generate revenues.

Panel & Spam blogs

Not just spam from automatic generation of link farms but also stealth marketers. How robust are applications, error rates, how do you rank sources? Carrie comes at this from Google as a statistician so your data is always dirty. Issues with data quality. Areas have been proactive in cleaning but always total junk can exist at backend. How can we use this data Jaguar (the car , mac o/s and animal etc.) From a statistical point of view distribution of data. Not conventional dirty data so all this power so hurricane is katrina, olympics is right now need to understand how population is addressing these issues ???????

Feedster view

Feeds dont provide additional info. What is coming along in the future? The growth of feeds index grew 5 fold last year from 5 million to 30 million feeds. Ability to qualify and quantify is a big challenge. Relevancy is more and more important. Bloggers meander around topics so can find common theme. Feedster is doing more and more syndication identifying conversations overlaid with news stories. Can be a world news story, sports etc. Simple example, Boston Globe wanted fans to comment on Boston Redsox game. read last night's story and see what fan was base was saying. Diifuclt to deal with dynamic content. Globe is a conservative property so they didnt cut out much surprisingly. A few years ago editors would not have allowed consumer generated content.

Cymfony perspective

Do different things and see teh value in information. Focus on marketing communications have 5 years of mainstream data. Look at different types of info beyond blogs. Blogs tend to be more negative. Take in unstrcutured info from newsboards. Converge nce between mainstream media and consumer generated. In fluential bloggers drive mainstream media. Journalists now do reserach on consumer generated media. Lots of spam on blogs so difficult to have automated approaches.

Panel - what kind of info

Howard talks about what Umbria does taht is provide consumer insight for B2C but B2B coudl be useful. Most insights come from question and then focus group. Potential of bias, getting info at point in time and soemtimes best is a quarterly snapshot.The blogosphere is always on to see how opinions are changing in real time. Other aspect of fascination around blogs vast majority is those that want to go to blogs about themselves school, life, work. Taking about a lousy day they then meet at Starbucks enjoyed a drink then went back to a lousy day. In the middle is a stream of consciousness. Blogs let us listen ot conversation over our shoulder. This ability to listen in makes it an incredibly valuable resource. kinds of info - what are people talking about and subtopics. Ability to lay sentiments and gain deep insights over time to see what is driving public opinion putting finger on pulse with regard to consumers.

Panel

Nothing mentioned amongst papers on sematic verb.

Andrew Bernstein /Cymfony - Helps organisations extract insigts
Carrie Grimes - background in anthropology and archeology
Howard Kaushansky - a lawyer ! Worked on fraud detection software for telcos (Coral systems)

Umbria applies AI to marketing

Cameron Marlow - research scientist at Yahoo develped Blogdex, Yahoo!Berkeley new partnership

Tony Perkins - Kids today are always on
Chris Redlitz - Reebok marketing for a decade - Feedster provides feeds from millions of newsblogs - largest quality index of feeds
Michael Sippy - Six Apart products include moveable type, typepad and livejournal

So let' start with question of what info comes from blogs age,gender, political opinion and pther features as well as network of links

Industrial Panel - Day 1 (after lunch)

We have been waiting all day for this panel to cover what companies will be doing with blogs and analysing in the future

Members of Panel :

Andrew Bernstein - CEO, Cymfony
Carrie Grimes - Google
Howard Kaushansky - CEO,Umbria
Camerom Marlow - Yahoo
Tony Perkins - Always on
Chris Redlitz (Feedster)
Michael Sippy - Six apart

Session organised around 6 questions

What info do you get from blogs
how good is it
blog analysis vs other methods
how will companies analyse blogs in the future
how will consumers use these analyses
what do you need from researchers

Saturday, March 25, 2006

AAAI Spring Symposium Series Info

The AAAI Spring Symposium Series will be held in the History Corner of the main quad (Building 200). Registration will be held in the foyer of Annenberg Auditorium on the lower level of the Cummings Art Building, which is across Lausen Mall from the History building. For a map of campus, please see http://www.stanford.edu/home/visitors/maps.html.

Registration hours will be:
  • Mon, Mar 27, 8:00 am - 5:00 pm
  • Tue, Mar 28, 8:30 am - 5:00 pm
  • Wed, Mar 29, 8:30 am - 12:00 pm
Coffee breaks will be held in the Citrus Courtyard, or in the Cummings Art Building in case of rain. The breaks are scheduled at 10:30 am & 3:30 pm, Mon and Tue, and 10:30 am only on Wed.

Please join us for the AAAI Reception to be held in the Oak Lounge at Tresidder Union at 6:00 pm, Mon, Mar 27. Refreshments will be served.

The Plenary Session will be held Tue, Mar 28 at 6:00 pm in the Annenberg Auditorium. It is open to the public and is at no additional cost.

Parking is available at the Galvez Lot, located across the street from the Track House parking lot at Stanford Stadium. The cost is $10 for the duration of the event. A parking permit is available for purchase at the AAAI Registration desk.

Campus Eateries:
  • Peet's Coffee (7am - 8pm) 3rd floor, James H. Clark Center on 318 Campus West Drive
  • The Cafe (8am - 7pm) Arrillaga Alumni Center
  • Cubberly Cafe (8am - 3pm) Cubberly Education Building
*********************************

If you have any further questions, please feel free to contact us at sss06@aaai.org. We look forward to seeing you next week at Stanford!

Carol Hamilton, Executive Director
--
Nicolas, Franco, Jim & Mark

Friday, March 24, 2006

Re: Presentation equipment

Symposium participants will have to bring their own laptops or borrow from someone in the group. AAAI does not provide laptops, only an LCD and a regular overhead projector. In any case many people will have their machines with them and we will work sth out on site.

It might be worth bringing your presentation on a memory and just in case making it available on the web.

Nicolas, Franco, Jim & Mark

Wednesday, March 22, 2006

Presentation equipment.

I have a question regarding the presentation equipment. Are the presenters supposed to bring laptops for presentation? Or the conference room is equipped with a presentation laptop?

Look forward to seeing you guys.

Xin Li

Tuesday, March 21, 2006

Re: poster dimension specifications

The poster boards are 30"x 40" and can be placed either horizontal or vertical on the stands.

Nicolas, Franco, Jim & Mark

Friday, March 17, 2006

poster dimension specifications

I'm wondering what the required dimensions for the posters are, and if our posters should be landscape or portrait.

Thanks,
Paula Chesley