Great second day at the Data to Knowledge conference (a few notes below).
Also I have been trying to move a few projects forward during the commute and break. I’ve sent Alan my revisions on the prosecutor’s fallacy manuscript, getting pretty close. Following up on a query from John, ran a simulation showing the impact of having a delay in the timeseries before the slow change towards the bifurcation begins. Certainly weakens power but does not preclude the linear-change model detecting the change, see example Also moving towards scheduling a call for the pdg-control manuscript on stochastic policy costs; some notes below.
Data to Knowledge Conference, Day 2
David A. Bader, algorithm design / HPC perspective
Opportunities:
- HPC on big data (massive graphs)
- Streaming analytics: Today’s challenges aren’t dense linear algebra.
- Informational visualization - more data on graph than we have pixels
- Hetergeneous systems - a system that can reconfigure memory hierarchy on the fly
- energy efficiency (moving data takes high energy as well as time)
This is not the time to get stuck in a single paradigm! Scientific computing != MPI Big data != Hadoop/Map-reduce
External disk -------------------------------------------- Internal memory
Hadoop -- traditional database -- cloud/cluster -- large shared memory
Easy programming -------------- harder
low performance --------------- higher performance
high energy use --------------- low energy use
Latency on memory (random access some address in a big global index) dominates over floating point calculations Cray XMT 512 processor, 128 threads/processor, 64TB global memory, no cache.
Jeff Hawkins: Brain as a streaming processor
learning isn’t about strengthening weights on synapses – they form and vanish all the time. connections are binary, with scalar permanence.
The two-Layer learning system (classification problem)
- Layer 0 is the original problem:
- receive labeled and unlabeled data
- train a classifier on labeled example, predict for others.
- Layer 1 meta-classifier which tries to learn over which data layer 0 does well (using binary classification on layer 0 as true or false)
Policy Costs
Framing the question: Need to discuss these issues, then get a someone to be point-person on each?
- Policy costs – Why this is important, what’s already known.
- Typical ways these concerns are addressed (i.e. quadratic costs on control), similarity & differences.
- Discussion of policy costs as “Regularization”
- Why stochastic?
- Background: Reed model.
Results so far: (discuss these, decide what needs more study)
- Introduce & characterize the impact of different norms: L1, L2, asymmetric, fixed costs
- Comparison to Reed
- The double-pay of policy costs: Paying for the transactions, difference from optimal policy
- Comparisons between these norms apples-to-apples.
Discussion/Conclusion – what do we highlight?
- Must first answer - which journal/audience?
Misc
Modified css to use twitter-bootstrap striped tables as table default, since markdown won’t add a class on tables automatically. Hopefully doesn’t cause trouble with other table layouts.
Note that xtable
can handle this by toggling options"
html.table.attributes=getOption("xtable.html.table.attributes", "class=table-striped"),
so setting html class of tables should work fine in knitr
. On the flip side, sounds like pandoc would do better with ascii tables as input than with html tables?