[Data Summit - Keynote Speech 2011] - A New Kind of Science: The NKS Forum

A New Kind of Science: The NKS Forum

Pages:1



Data Summit - Keynote Speech 2011

(Click here to view the original thread with full colors/images)



Posted by: Dan Ellwein

Keynote talk given at the Wolfram Data Summit in Washington, DC on Thursday, September 8, 2011

here are some snippets from Stephen Wolfram's speech:


the tower of technology we've ended up building in Wolfram|Alpha

a fluke of circumstances let us be able to build it

incredible possibilities

effects we can have in democratizing knowledge

everything everyone everywhere

type in a query

compute an answer

four big pieces - data algorithms linguistics presentation

systematically working to curate and include data about every possible domain

finding real good sources is absolutely critical

understanding those sources is critical

data souces - it's a very brutal business

very strong tools for looking at the data

remarkably hard to predict what the quality of data will be

its a significant problem that data just isn't in digital form

it will inevitably take all kinds of effort

just getting the data is only 10% of the problem

90% is making the data computable

done a huge amount of automation of that process

developed some excellent management procedures

the bottom line - humans have to be involved

databases do not speak for themselves

once that expert knowledge has been captured - one can compute from the data

what is involved in computing answers - it will be algorithms

beyond data - is implementing algorithms methods models

algorithms methods models - part of the knowledgebase of civilization trying to capture in systematic permanent form

can not be represented with a structure like a database

needs the full infrastructure of computation mathematics

the Mathematica language can represent this

natural language processing - it has brought us search engines and IBM's Watson Jeopardy machine

these have been about grinding up text and trying to match fragments

no attempt to know in any fundamental sense what the text means

to compute an answer you have to understand the question

the computational universe of possible programs

by having the right kinds of little programs interacting it might be possible to understand natural language

Its complicated stuff

big mixture of fundamental algorithms

lots of little theories

lots of human effort

but in the end

we are able to systematically go from natural language

to a precise symbolic internal representation of questions

Its like we're parsing pretty raw human thoughts

And turning them internally into little pieces of Mathematica code

billions of queries... so we can use that to slowly learn the language of human thoughts

what should it choose to compute?

What should it display?

How should it organize what it displays?

this too is a difficult problem

when we start on some domain the first outputs are pretty useless

not organized to be useful to humans

automated heuristics

design goals

human experts

arrange these so they are easy for humans to assimilate

like other parts of the technology stack - this tends to be pretty unforgiving

the good news - success is proportional to effort put in to a particular domain

it makes one feel good — because it means that effort is adding value

bunch of different groups - development processes that thread throughout

- main content groups -

socioeconomic

geographic

sci-tech

medical

financial

math

algorithmic

cultural

consumer

- eclectic team of experts -

very diverse in terms of types of work

challenge is to - figure out what models make sense for a particular kind of computation

how to do the right statistics for some particular kind of result

there is usually a lot of judgment involved with our content experts

beyond the core vertical content groups there are a bunch of horizontal groups

a general frameworks group - for handling dates and times

a core parser development group - builds system for understanding inputs an endless mixture of small-scale pieces

micro-algorithms to represent all sorts of features of language

static data in our data cloud

real-time feeds and API's

content-area-knowledgeable people - take things about data flagged by automated systems and resolve them

a growing outside network of volunteers

doing quite a bit of primary data research on data that hasn't already been aggregated

Wolfram|Alpha is a pretty complicated thing...

design analysis group - maintains standards for what Wolfram|Alpha results should be like

integrated graphic design group -maintains standards for visual appearance

user experience group - figures out issues for the operation of Wolfram|Alpha

resolves knottier information presentation issues

Advanced Research Group - take crazy ideas of mine that turns into features of Wolfram|Alpha

there's the team that actually handles the practicalities of building the big system that is Wolfram|Alpha

there's an operations group, that does 24/7 monitoring of our systems

there's a quality analysis group, that takes in user feedback and analyzes it

there's our quality assurance group...

With our work on Mathematica over the years

we've set high standards for what QA can achieve

with sufficient automation and cleverness

Wolfram|Alpha is a really complicated thing to test...

new data and feeds are flowing in all the time - that gets built every hour into a candidate new Wolfram|Alpha

every single week - we push a new version of the Wolfram|Alpha codebase

And so far — touch wood — at two years and counting our QA process has been essentially flawless

I don't know what to call it all—perhaps "computational knowledge engineering"...





Forum Sponsored by Wolfram Research

© 2004-2013 Wolfram Research, Inc. | Powered by vBulletin 2.3.0 © 2000-2002 Jelsoft Enterprises, Ltd. | Disclaimer
vB Easy Archive Final - Created by Xenon and modified/released by SkuZZy from the Job Openings