Dear Ken,
Thanks for your messages. A lot of what you wrote I agree with. I can see
what you are saying. Other times, it sometimes feels we are on different
worlds on these issues!
Stepping back a little. The core of this email exchange from my point of
view (and my post to Don) is the limiting boundary behaviours of theories,
methods and approaches to problems and designs (what in maths might be
called 'boundary conditions')..
From this perspective, useful questions are:
'Exactly how far can we validly use a method, theory, approach etc?'
'What are the limits beyond which a method, theory or approach does not
work?'
'Is this because it is too inefficient, not valid, there is some
biological/cognitive limitation etc?'
'What criteria can we use to identify the boundaries of the limits of
application of particular theories, methods or approaches?'
There are many boundaries and edges to the use of methods, theories and
approaches to analysis and design.
One that has been of particular interest to me has been the boundary due to
the biological limits of human thinking/feeling/intuition in terms of the
limits to the complexity of situations we can predict the dynamic outcome
behaviours.
There are many real and absolute boundaries to human
thinking/feeling/intuition, creativity and design skills. Just one of them
is the 2 feedback loop limit I identified. It isn't the only boundary.
Another rather more obvious one is the boundary between something being
merely very difficult or intrinsically impossible. There are many other
obvious boundary conditions for human activities to be identified and
defined.
Another boundary/edge limitation that emerged out of this email discussion
(and I'd love to get time to look at it) is those situations where not only
is the amount of data too big to look at, the time and resources needed for
processing the data into a form that is within human ability is also too
big. A simple example is the limit when not only is a lifetime less than the
time needed to listen to all the music on an iPod; there is also
insufficient time in one life to scroll through the options; or search for
the track; or mentally there is not enough human memory space to remember
all the tracks etc. these suggest there is probably an amount of music for
which there is a human boundary and one can possible find a simple rule to
approximately identify that amount. Thirty years ago when running a jazz
club, I employed a blind pianist who played jazz and classical music. It
was an interesting calculation to compare the amount of bits of data of all
the symphonies he knew with the number of neurons in a typical brain. It
appeared he was getting close to the limit! It's a similar kind of
assessment that leads to the idea of a retina display and its limiting
pixel count.
Where I think we agree:
1. I agree with you that basic review of large datasets consists of
selecting useful subsets that give information. This is the kind of data
mining that is a comparative cost benefit analysis on different criteria.
I.e 'do you get a bigger bang for your buck contacting this group or that
group'.
2. I agree that without further analysis and information, it is not
possible to identify whether the data analysis done by Obama's team is
intrinsically impossible without some form of mathematical analysis - or
whether you could simplyu take lots of people looking at the data who do no
calculations on it and can
3. I agree that the purpose of most mathematical modelling and analysis is
to identify a representation of information that is simple enough for people
to understand and make decisions about. Converting a billion data points
into identifying that one group will make higher contributions when an
appeal letter is signed by Joe Biden rather than Obama brings decision
making into a simple enough form that humans can manage.
Where I feel we disagree:
1. That the cutting edge of big data analysis has moved on from
establishing data patterns that was the cutting edge of one or two decades
ago. This is the approach you described in your magazine example. I agree it
is still widely used for many problems including identifying best
advertising spend and in design in identifying best changes to web pages
(the A:B comparison). It remains a simple procedure that is relatively easy
to understand and can in theory be undertaken with nothing more
mathematically complex than an adding machine.
2. The kind of theory I see as more cutting edge of the data mining arena
is around predictive modelling and dynamic systems modelling. Again, both
simply convert data into a form that is easier for humans to use to more
easily and reliably try out ideas. I agree absolutely that at this stage,
the main process in dealing with these big complicated situations is for
humans to try out human generated ideas against the models of the data
rather than the models generating the best solution. In most areas that is
a job in progress rather than state of art.
Below are listed some weblinks of articles of the sorts of articles that I
see as being towards the cutting edge of data mining using predictive
analytics and systems dynamic modelling. As you will see, the dates suggest
these came into use some years ago.
3. Obama's team clearly used predictive modelling and machine learning
modeling rather than only concentrating on simple pattern identification. I
suggest the descriptions of the process in Time as 'data mining' is to make
the ideas easier to read. I've attached the job vacancy advertisement for
Obama's staff for that team and it clearly asks for predictive modelling
and machine learning modelling skills. Interestingly, this suggests they
didn't use system dynamics ort agent-based types of modelling. You can,
however, do similar things with predictive modelling and analysis
4. (Probably the greatest disagreement), I suggest that predicting voter
behaviour well does require feedback loop based dynamic modelling
(regardless of whether Obama's team used it or not). When I create even the
simplest models of the effects of political influence on voter behaviour,
modelling the influences of feedback loops seems to be essential. Standing
back, this is not surprising. The US as a country would be expected to
behave as any other very large organisation and theories of single, double
(and if Obama's team are really good) triple loop learning would be expected
to apply as a result of a combination of previous political action and
current campaign presentations and debates.
I know you have good access to some things that are almost impossible for us
mere mortals. If you get more detailed information on Obama's team's
methods, please could you post them?
Best wishes,
Terry
===
Dr Terence Love
Social Program Evaluation Research Unit
Edith Cowan University
Joondalup, Western Australia 6027
[log in to unmask]
Mob: +61 (0) 434 975 848
===
Weblinks:
http://jasss.soc.surrey.ac.uk/7/4/6.html
http://web.eecs.umich.edu/~baveja/Papers/learnhgmm-AAMAS.pdf
http://tercer.bol.ucla.edu/papers/turnout.pdf
http://www.nber.org/papers/w10748.pdf?new_window=1
http://assets.cambridge.org/97805216/62222/frontmatter/9780521662222_frontma
tter.pdf
http://www.cs.bham.ac.uk/~jer/papers/linear.pdf
http://www.systemdynamics.org/conferences/2011/proceed/papers/P1032.pdf
http://www.knowledgeminer.com/pdf/mining.pdf
http://www.systemdynamics.org/conferences/2010/proceed/papers/P1214.pdf
http://www.statsoft.com/textbook/data-mining-techniques/
http://userwww.service.emory.edu/~dlinzer/Linzer-prespoll-May12.pdf
http://gigaom.com/data/big-data-politics-why-you-cant-outrun-campaigns-by-av
oiding-the-tv/
Obama's data mining team
(http://www.kdnuggets.com/jobs/11/07-13-obama2012-predictive-modeling-data-m
ining-scientists-analysts.html ) used predictive modelling -
http://www.dtreg.com/
==
The advert for Obama's team: Predictive Modeling and Data Mining
Scientists/Analysts
http://www.kdnuggets.com/jobs/11/07-13-obama2012-predictive-modeling-data-mi
ning-scientists-analysts.html
analyze the campaign data to guide election strategy and develop
quantitative, actionable insights that drive our decision-making. Looking
for people at both the senior and junior level to join the campaign
Analytics Dept through November 2012.
Company: Obama 2012 Presidential Campaign
Location: Chicago, IL
Web: www.barackobama.com
The Obama for America Analytics Department analyzes the campaign's data to
guide election strategy and develop quantitative, actionable insights that
drive our decision-making. Our team's products help direct work on the
ground, online and on the air.
We are looking for Predictive Modeling/Data Mining Scientists and Analysts,
at both the senior and junior level, to join our department through November
2012 at our Chicago Headquarters. We are a multi-disciplinary team of
statisticians, predictive modelers, data mining experts, mathematicians,
software developers, general analysts and organizers - all striving for a
single goal: re-electing President Obama.
Using statistical predictive modeling, the Democratic Party's comprehensive
political database, and publicly available data, modeling analysts are
charged with predicting the behavior of the American electorate. These
models will be instrumental in helping the campaign determine which voters
to target for turnout and persuasion efforts, where to buy advertising and
how to best approach digital media.
Our Modeling Analysts will dive head-first into our massive data to solve
some of our most critical online and offline challenges. We will analyze
millions of interactions a day, learning from terabytes of historical data,
running thousands of experiments, to inform campaign strategy and critical
decisions.
Responsibilities include:
. Develop and build statistical/predictive/machine learning models to
assist in field, digital media, paid media and fundraising operations
. Assess the performance of previous models and determine when these
models should be updated
. Design and execute experiments to test the applicability and
validity of these models in the field
. Create metrics to assess performance of various campaign tactics
. Collaborate with the data team to improve existing database and
suggest new data sources
. Work with stakeholders to identify other research needs and
priorities
Required Experience:
. B.S degree (M.S/PhD for scientist and senior positions) in
statistics, machine learning, mathematics, quantitative methods, computer
science, or related field
. Experience with political, Nielsen/Arbitron, fundraising or digital
media & online advertising data
. Application of advanced statistical, machine learning, and/or data
mining techniques (i.e. classification, clustering, association mining,
forecasting), to real-world problems with massive data
. Experience with text data, search, natural language processing,
social media analytics is a plus - we're also hiring for text mining
positions.
. Proven creativity and problem-solving skills
Required Software:
Applicants must have demonstrated, extensive experience (professional or
academic) with any major statistical or data mining package (R, STATA, SPSS,
SAS, Enterprise Miner, Matlab, KNIME, Weka). Other desired software skills
would include:
. Any SQL-based query language (MySQL, PostgreSQL, etc.)
. Programming skills desirable but not required for all positions (C#,
C++, Java, Python, Ruby, Perl)
. Strong MS Excel skills also desired
_Contact_:
Please send resumes to [log in to unmask] and mention kdnuggets.
Obama for America is committed to diversity among its staff, and recognizes
that its continued success requires the highest commitment to obtaining and
retaining a diverse staff that provides the best quality services to
supporters and constituents. Obama for America is an equal opportunity
employer and it is our policy to recruit, hire, train, promote and
administer any and all personnel actions without regard to sex, race, age,
color, creed, national origin, religion, economic status, sexual
orientation, veteran status, gender identity or expression, ethnic identity
or physical disability, or any other legally protected basis. Obama for
America will not tolerate any unlawful discrimination and any such conduct
is strictly prohibited.
-----------------------------------------------------------------
PhD-Design mailing list <[log in to unmask]>
Discussion of PhD studies and related research in Design
Subscribe or Unsubscribe at https://www.jiscmail.ac.uk/phd-design
-----------------------------------------------------------------
|