Skip to main content

How does Neural Network learn?

While neural networks do contain learned information, describing them plainly as storing “compressed knowledge” isn’t quite accurate.

Neural networks store patterns of weights and biases that were optimized during training. These parameters allow the network to recognize patterns and make predictions, but they don’t store knowledge in a way that’s analogous to human memory or a traditional database.


Think of it more like a complex mathematical function that’s been tuned to transform inputs into desired outputs. The “knowledge” isn’t stored in an easily interpretable or compressed format - it’s distributed across billions of parameters in a way that often aren’t straightforward to analyze or understand.


This is why neural networks can sometimes: 

  • Make confident predictions that are completely wrong

  • Fail to generalize in expected ways

  • Have difficulty transferring knowledge to new contexts

  • Produce inconsistent outputs.


How does neural network actually store and process information

Consider this as the basic neuron operation:


Where y = output, f is activation function (ReLU, SoftMax, sigmoid etc.) x = input, w = weights, b = bias Network structure:

  • Instead of having a clear “knowledge database,” a neural network consists of layers of interconnected artificial neurons

  • Each connection has a weight, and each neuron has a bias value

  • These weights and biases are what get adjusted during training


Learning Process:

  • The network doesn’t memorize facts - it learns to recognize patterns through repeated exposure to training data.

  • During training, it adjusts those weights and biases to minimize prediction errors.

  • This is more like learning to recognize patterns than storing explicit knowledge.


What’s actually stored:

  • Primarily mathematical parameters (weights and biases)

  • These parameters define how input signals should be transformed

  • The “knowledge” is implicit in these transformations, not explicitly stored


Key Differences from Human Knowledge:

  • No semantic understanding

  • No concept of causation

  • No ability to reason about stored information

  • Can’t explain its own decision making process

  • Can’t easily transfer learning to new contexts


Example: when a neural network learns to recognize a cat, it’s not storing “cats have fur, four legs, etc.” Instead it’s storing parameters that transform pixel values through mathematical operations that happen to output “cat” when shown cat images.


Think of it less like a library of knowledge and more like a complex filter that’s been shaped by training data to transform inputs into useful outputs. 


Comments

Popular posts from this blog

2024 Progress...

My team has made considerable advancements in applying various emerging technologies for IMG (Investment Management Group). Predictive Models We have transitioned from conventional methods and refined our approach to using alternative data to more accurately predict the CPI numbers. Our initial approach has not changed by using 2 models (top-down & bottoms-up) for this prediction.   So far we have outperformed both our larger internal team and major banks and dealers in accurately predicting the inflation numbers. Overall roughly 80% accuracy with the last 3 month prediction to be right on the spot.  We have also developed predictive analytics for forecasting prepayment on mortgage-backed securities and predicting macroeconomic regime shifts. Mixed Integer Programming  / Optimization Another area of focus is on numerical optimization to construct a comprehensive portfolio of fixed-income securities for our ETFs and Mutual Funds. This task presents numer...

What matters?

 What matters? Six things that matter in LLM in July 2024. 1) Scale of the model, number of parameters: Scale with brute force alone won't work. But the scale does matter depending on the overall goal and the purpose of what the LLM is trying to solve.   2) Compute matters: Even more than ever, we need to look at the infrastructures around LLMs. Infrastructure is also one of the main constraints for the near term and strategically provides an advantage to a few Middle East countries. 3) Data, quality & quantity. It remains true that high-quality data with extensive (longer) training is the way. Quantity of the data also matters. 4) Loss function matters: If your loss function isn't sophisticated or incentivizes the "right" thing, you will have limited improvement. 5) Symmetry or architecture: Do you have the correct architecture around your model(s) and data? Inefficient engineering can be costly to the overall performance and output. There are inherent structural...

Research Paper on MoA (Mixture of Agents)

Despite one-year setback... MoA is All You Need: Building LLM Research Team using Mixture of Agents My first attempt at using NotebookLM to create the podcast from research papers. YouTube  url on the research paper. Currently looking for ways to improve my consumption of the relevant research papers. Anyone know of a good platform that can turn the .wav files to realistic video?