Formal SystemsCognition, cognitive theory are formal systems, we use mathematics to study them  Abstract concepts can be reasoned about precisely when situated in formal systems  Neural networks are continuous systems 
Constructing the continumAxiomatization Describe the basic properties and declare them to be true by definition  Construction Use simpler objects and operations to explicitly define more complex models  Equivalence classes Partitions a set based on some rules 
Dynamic SystemsRecurrent ,learning, and biological neural networks  Motor control is the effector of a dynamic system  The mind is an abstract dynamical system with continuous state variables that are not activation values of units or representations 
NNs as Probabilistic ModelsUsed for  stochastically searching for global optima   representing and rationally coping with uncertainty   measuring information  Deployed in  neural network models   symbolic modes with nondeterminism and uncertainty (e.g. inferring knowledge from experience, using knowledge to infer outputs given inputs) 
  OptimizationProcessing Activation dynamics. Maximizes wellformedness (harmony) of the activation patter (depends on the connection weight). Spreading activation dynamics is an optimization algorithm for the representation.  Learning Weight dynamics. Minimizes error. Weightadjustment dynamics is an optimization algorithm for the knownledge in the weights: learning algorithm  Probabilistic modelling Parameters of the statistical model change as more data is received. Optimized based of likelihood according to data or Bayesian posterior probability of the data 
Fourier analysisf(x) = \sum_k c_k e^{ikx} employs a basis of imaginary powers of x, {e^{ikx}}_{k \in \Z}  Also a basis of cos(kx) and sin(kx)  Fourier coefficient states how strongly an oscillation of frequency 1/k is present in f  {f(t)}_t describe f in the time / spatial domain  {c_k}_k describe f in the frequency / spatialfrequency domain 
Support Vector MachinesUse supervised learning to learn a region of activation space for each concept  Classification driven only by training near the region boundary  Wide margin: error function favors a large margin between the training samples and the boundary it posits for separating the categories  Slack variable: minimizes a variable for each training example that "picks up the slack" between the point and the category region it should be in  Kernel trick: implicitly maps the data into a high dimensional space in which classification conceptually takes places (implemented through a kernel function) 
  Discrete structures of distribution patternsvectors v in R: distributed representation  With respect to an appropriate conceptual basis for V, components of a representation v indicate the strength of a set of basis concepts in v: gradient conceptual representation  Eigenbasis rescales components  Analyse the entire distribution within V of the representations {v^k} of a set of represented items {x^k}  Clusters of {v^k} constitute a conceptual group and may be hierachically structured  Can construct is such that greater distance between v^{k} and v^{t} means greater mental distinguishability of x^{k} and x^\{t} 
HarmonyWeight matrices, error functions, learning as optimization  Activation vectors, wellformedness = harmony function, processing as optimzation   Parallel, violableconstraint satisfaction   Schemas/prototypes in Harmony landscapes  Local optima is a deterministic problem, global optima require randomized/stochastic algorithms 
Inductive learningFinding the best hypothesis within a hypothesis space about the word that the learning is trying to understand  Goodness of a hypothesis is determined jointly by how well H fits the data that the learning has received about the world and how simple H is  A hypothesis is a probabilistic datagenerator  Maximum Likelihood Principle  Maximum A Posteriori Principle: Bayesian principle that says pick the hypothesis that has highest a posteriori probability, balancing the likelihood of the data against the a priori probability of H. Best not pick a H, maintain a degree of belief for every H in the H space  Maximum Entropy Principle: Maxent. Pick the H with the max missing information, among those H that are consistent with the known data  Minimum Description Length Principle: shorter is better 

Cheatographer
Metadata
Comments
No comments yet. Add yours below!
Add a Comment
Related Cheat Sheets