[Data Summit - Keynote Speech 2011] - A New Kind of Science: The NKS ForumA New Kind of Science: The NKS Forum
Pages:1
Data Summit - Keynote Speech 2011
(Click here to view the original thread with full colors/images)
Posted by: Dan Ellwein
Keynote talk given at the Wolfram Data Summit in Washington, DC on Thursday, September 8, 2011
here are some snippets from Stephen Wolfram's speech:
the tower of technology we've ended up building in Wolfram|Alpha
a fluke of circumstances let us be able to build it
incredible possibilities
effects we can have in democratizing knowledge
everything everyone everywhere
type in a query
compute an answer
four big pieces - data algorithms linguistics presentation
systematically working to curate and include data about every possible domain
finding real good sources is absolutely critical
understanding those sources is critical
data souces - it's a very brutal business
very strong tools for looking at the data
remarkably hard to predict what the quality of data will be
its a significant problem that data just isn't in digital form
it will inevitably take all kinds of effort
just getting the data is only 10% of the problem
90% is making the data computable
done a huge amount of automation of that process
developed some excellent management procedures
the bottom line - humans have to be involved
databases do not speak for themselves
once that expert knowledge has been captured - one can compute from the data
what is involved in computing answers - it will be algorithms
beyond data - is implementing algorithms methods models
algorithms methods models - part of the knowledgebase of civilization trying to capture in systematic permanent form
can not be represented with a structure like a database
needs the full infrastructure of computation mathematics
the Mathematica language can represent this
natural language processing - it has brought us search engines and IBM's Watson Jeopardy machine
these have been about grinding up text and trying to match fragments
no attempt to know in any fundamental sense what the text means
to compute an answer you have to understand the question
the computational universe of possible programs
by having the right kinds of little programs interacting it might be possible to understand natural language
Its complicated stuff
big mixture of fundamental algorithms
lots of little theories
lots of human effort
but in the end
we are able to systematically go from natural language
to a precise symbolic internal representation of questions
Its like we're parsing pretty raw human thoughts
And turning them internally into little pieces of Mathematica code
billions of queries... so we can use that to slowly learn the language of human thoughts
what should it choose to compute?
What should it display?
How should it organize what it displays?
this too is a difficult problem
when we start on some domain the first outputs are pretty useless
not organized to be useful to humans
automated heuristics
design goals
human experts
arrange these so they are easy for humans to assimilate
like other parts of the technology stack - this tends to be pretty unforgiving
the good news - success is proportional to effort put in to a particular domain
it makes one feel good — because it means that effort is adding value
bunch of different groups - development processes that thread throughout
- main content groups -
socioeconomic
geographic
sci-tech
medical
financial
math
algorithmic
cultural
consumer
- eclectic team of experts -
very diverse in terms of types of work
challenge is to - figure out what models make sense for a particular kind of computation
how to do the right statistics for some particular kind of result
there is usually a lot of judgment involved with our content experts
beyond the core vertical content groups there are a bunch of horizontal groups
a general frameworks group - for handling dates and times
a core parser development group - builds system for understanding inputs an endless mixture of small-scale pieces
micro-algorithms to represent all sorts of features of language
static data in our data cloud
real-time feeds and API's
content-area-knowledgeable people - take things about data flagged by automated systems and resolve them
a growing outside network of volunteers
doing quite a bit of primary data research on data that hasn't already been aggregated
Wolfram|Alpha is a pretty complicated thing...
design analysis group - maintains standards for what Wolfram|Alpha results should be like
integrated graphic design group -maintains standards for visual appearance
user experience group - figures out issues for the operation of Wolfram|Alpha
resolves knottier information presentation issues
Advanced Research Group - take crazy ideas of mine that turns into features of Wolfram|Alpha
there's the team that actually handles the practicalities of building the big system that is Wolfram|Alpha
there's an operations group, that does 24/7 monitoring of our systems
there's a quality analysis group, that takes in user feedback and analyzes it
there's our quality assurance group...
With our work on Mathematica over the years
we've set high standards for what QA can achieve
with sufficient automation and cleverness
Wolfram|Alpha is a really complicated thing to test...
new data and feeds are flowing in all the time - that gets built every hour into a candidate new Wolfram|Alpha
every single week - we push a new version of the Wolfram|Alpha codebase
And so far — touch wood — at two years and counting our QA process has been essentially flawless
I don't know what to call it all—perhaps "computational knowledge engineering"...
Forum Sponsored by Wolfram Research
© 2004-2013 Wolfram Research, Inc. | Powered by vBulletin 2.3.0 © 2000-2002 Jelsoft Enterprises, Ltd. |
Disclaimer
vB Easy Archive Final - Created by Xenon and modified/released by SkuZZy from the Job Openings