Part 5 – The Machine-Learning Shift
Exploring the rise of machine learning in the 1990s, including key algorithms, applications, and the impact of data.
📚 Part 5 of 6 in How Did We Get Here?
Previously: 4-The Expert System Era – Knowledge is Power (1980s) Next up: Deep Learning Revolution – Neural Networks Strike Back (2000s-2010s)
Executive summary
By the 1990s, AI researchers pivoted from brittle, rule-centric expert systems to statistical learning methods that could learn patterns directly from data. Decision trees, Bayesian networks, and the newly-minted support-vector machine showed that algorithms, not handcrafted rules, could generalise from examples. The same decade’s data-mining boom, the internet’s explosive growth, and IBM’s Deep Blue chess victory cemented machine learning (ML) as the new AI paradigm, including in India, where software-export zones such as STPI Bhubaneswar laid the groundwork for today’s AI ecosystem. This post unpacks that shift, shows the mathematics behind early ML workhorses, and gives you three hands-on Colab tutorials you can run right now.
1 Introduction
The 1980s “expert-system era” (Part 4) promised to bottle human expertise as if-then rules. Yet knowledge engineers soon hit a knowledge-acquisition bottleneck—rules were expensive to write, hard to maintain, and brittle in novel situations. Meanwhile, cheap computing and exploding databases suggested a different strategy: learn the rules from data instead of writing them by hand. Statistical learning theory, honed for decades in pattern recognition, finally met the data required to realise it. As a result, the 1990s saw a decisive paradigm shift from “Knowledge is Power” to “Data is Power.” That shift—our focus here—set the stage for the deep-learning renaissance of the 2010s.
2 From Hand-Crafted Rules to Learned Models
2.1 Paradigm comparison
| Characteristic | Rule-Based Expert System | Machine-Learning Model |
|---|---|---|
| Knowledge source | Human domain experts | Empirical data |
| Scalability | Linear in number of rules | Improves with more data |
| Handling noise/uncertainty | Poor | Built-in probabilistic tolerance |
| Maintenance cost | High (manual updates) | Retrain or fine-tune |
Manual knowledge engineering faltered once domains grew too complex: DEC’s XCON needed ~10,000 rules and a dedicated upkeep team. By contrast, algorithms such as ID3 could ingest thousands of labelled examples and yield a decision policy automatically ([link.springer.com][1]).
Key ML advantages
- Scalability – bigger corpora improved accuracy rather than overwhelming authors.
- Robustness – probabilistic models degrade gracefully on edge cases.
- Automatic feature discovery – algorithms uncover patterns humans overlook.
flowchart LR
A[Domain Experts] -->|Encode| R(Rule Base)
R -->|Inference| O[Outputs]
subgraph ML_Workflow
D[Raw Data] --> P[Pre-processing]
P --> T[Train Algorithm]
T --> M[Model]
M --> O2[Outputs]
end
classDef manual fill:#f8d7da;
class R,A manual;
Rule vs ML workflow (Flowchart contrasting manual rule entry with automated training loop”)
Tiny code taste
# Rule: if temp > 37.5°C then "fever"
return
# Learned logistic-regression model
=
= # 1 = fever
=
# → array([1])
3 Early ML Algorithms & Successes
3.1 Decision Trees (ID3 → C4.5)
ID3 introduced entropy-based node splitting [1]. C4.5 generalised it to handle continuous features and pruning [2].
=
=
# visualises splits
graph TD
S1["sepal_len ≤ 5.45?"] -->|yes| C1[Leaf: setosa]
S1 -->|no| S2["petal_len ≤ 2.45?"]
S2 -->|yes| C2[Leaf: versicolor]
S2 -->|no| C3[Leaf: virginica]
Example decision tree (Toy tree splitting on sepal length/width)
3.2 Bayesian Networks
Pearl’s 1988 treatise revived probabilistic reasoning ([amazon.com][3]). By the mid-1990s, BN-powered diagnostic tools predicted liver disorders with clinically useful accuracy ([citeseerx.ist.psu.edu][4], [cs.ru.nl][5]).
3.3 Support Vector Machines (SVM)
Cortes & Vapnik’s 1995 paper formalised margin maximisation ([link.springer.com][6]). The kernel trick let linear algebra solve non-linear problems in high-dimensional feature spaces.
=
3.4 IBM Deep Blue (1997)
Deep Blue’s 32-node RS/6000 SP supercomputer evaluated 200 M positions/s [3]. After losing Game 1, it defeated Kasparov 3½-2½, a watershed media moment for AI [4].
Mathematical insight: The margin in SVM is the distance between the decision boundary and the nearest data points (support vectors). Maximizing this margin improves generalization to unseen data.
graph TD
subgraph "SVM Classification"
A["● Class +1"]
B["○ Class -1"]
C[Decision Boundary]
D["Support Vectors"]
E["Maximum Margin"]
end
A -.-> D
B -.-> D
D --> C
C --> E
SVM concept diagram (Shows class separation with maximum margin decision boundary)
4 AI in the 90s – Real-World Applications
- Credit scoring – Neural nets cut default rates in US credit-union data ([sciencedirect.com][9]). Indian banks began pilot scoring systems late-decade ([researchgate.net][10]).
- Market-basket analysis – Agrawal & Srikant’s 1994 Apriori algorithm extracted shopping patterns an order of magnitude faster than predecessors ([vldb.org][11], [ibm.com][12]).
- Customer segmentation – Decision-tree ensembles boosted telco churn prediction accuracy.
- Web search – AltaVista’s crawler fuelled TF-IDF ranking; Google’s PageRank (1998) soon leveraged link structure.
- Recommenders – Amazon (1998) deployed item-to-item collaborative filtering, an association-rule cousin.
graph LR
FIN[Finance] -- SVM / NNs --> CREDIT[Risk Scoring]
RET[Retail] -- Apriori --> BASKET[Association Rules]
WEB[Web] -- Crawlers --> SEARCH[Search Engines]
TEL[Telecom] -- Trees --> CHURN[Churn Prediction]
1990’s AI ecosystem (Nodes for Finance, Retail, Web, Telecom connected to ML methods)
5 Indian Context
5.1 IT-services boom
India’s software exports rocketed from $175 M in 1990 to $8.7 B by 2000—>50 % CAGR ([faculty.washington.edu][13]). Bangalore earned “Silicon Valley of India” status ([wired.com][14]).
5.2 Early AI adoption
- Banking – ICICI experimented with neural-network loan risk models.
- Agriculture – prototype decision support systems helped optimise irrigation and pest control [5].
- Education – IITs and IISc rolled out elective ML courses by 1998.
5.3 Odisha spotlight
The Software Technology Park of India (STPI), Bhubaneswar opened in 1990, creating a data-link hub and incubation programmes that later hosted regional AI startups [6].
timeline
title India’s Tech Evolution 1990-1999
1990 : STPI Bhubaneswar founded
1991 : Economic Liberalisation
1993 : VSNL brings public Internet
1995 : NASSCOM push on software exports
1998 : IT Act drafted
India’s tech-ecosystem timeline (1991 Liberalisation → 1993 Internet → 1998 IT Act etc.)
6 Hands-On Demo Section
All three tutorials are available as Colab notebooks; click “Open in Colab,” run, and experiment with the sliders.
| Tutorial | Colab Link | Key Concepts |
|---|---|---|
| Decision Tree Classifier | <https://colab.research.google.com/drive/ | Entropy, pruning, decision boundaries |
| Naïve Bayes Text Spam Filter | <https://colab.research.google.com/drive/ | Bag-of-words, Laplace smoothing |
| Support Vector Machine | <https://colab.research.google.com/drive/ | Kernels, margin, cross-validation |
6.1 Decision Tree – Iris demo
! -==1.5 -
=
, = , # sepal dims only
=
;
Extension: try max_depth=None and note over-fitting warning.
6.2 Naïve Bayes – Spam detection
=
, , , =
=
Try This: adjust alpha with a slider (FloatSlider) and watch precision-recall shift.
6.3 SVM – Kernel playground
, =
=
;
Common Pitfall: Large C overfits; observe jagged boundaries.
Learning Checkpoints
Concept Check 1: Why does entropy guide decision-tree splits better than simple accuracy? Concept Check 2: How does the kernel trick avoid computing in infinite-dimensional space?
Common Pitfall: Treating Naïve Bayes independence assumption as gospel—watch for correlated features.
Assessment
- Quiz: What property of SVMs maximises generalisation?
- Coding challenge: Replace Iris with Wine dataset and repeat Tutorial 1.
- Case study prompt: Argue whether Deep Blue was really “AI.” Support with 1990s definitions.
Solutions are included at the bottom of each Colab notebook.
Preparing for Part 6
Statistical learning solved many 1990s problems, yet hand-crafted features were still king. Next time we’ll see how representation learning and neural networks staged a comeback, giving birth to deep learning.
Happy learning – see you in Part 6!
📝 Series Navigation
-
https://link.springer.com/article/10.1007/BF00116251?utm_source=odishaai.org “Induction of decision trees | Machine Learning” ↩
-
https://link.springer.com/article/10.1007/BF00993309?utm_source=odishaai.org “C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan …” ↩
-
https://www.ibm.com/history/deep-blue?utm_source=odishaai.org “Deep Blue - IBM” ↩
-
https://www.wired.com/2011/05/0511ibm-deep-blue-beats-chess-champ-kasparov?utm_source=odishaai.org “May 11, 1997: Machine Bests Man in Tournament-Level Chess Match” ↩
-
https://www.researchgate.net/publication/221916044_Decision_Support_Systems_in_Agriculture_Some_Successes_and_a_Bright_Future?utm_source=odishaai.org “Decision Support Systems in Agriculture: Some Successes and a …” ↩
-
https://bhubaneswar.stpi.in/en?utm_source=odishaai.org “STPI - Bhubaneswar - Software Technology Park of India” ↩