Part 5 – The Machine-Learning Shift

📚 Part 5 of 6 in How Did We Get Here?

Previously: 4-The Expert System Era – Knowledge is Power (1980s) Next up: Deep Learning Revolution – Neural Networks Strike Back (2000s-2010s)


Executive summary

By the 1990s, AI researchers pivoted from brittle, rule-centric expert systems to statistical learning methods that could learn patterns directly from data. Decision trees, Bayesian networks, and the newly-minted support-vector machine showed that algorithms, not handcrafted rules, could generalise from examples. The same decade’s data-mining boom, the internet’s explosive growth, and IBM’s Deep Blue chess victory cemented machine learning (ML) as the new AI paradigm, including in India, where software-export zones such as STPI Bhubaneswar laid the groundwork for today’s AI ecosystem. This post unpacks that shift, shows the mathematics behind early ML workhorses, and gives you three hands-on Colab tutorials you can run right now.


1 Introduction

The 1980s “expert-system era” (Part 4) promised to bottle human expertise as if-then rules. Yet knowledge engineers soon hit a knowledge-acquisition bottleneck—rules were expensive to write, hard to maintain, and brittle in novel situations. Meanwhile, cheap computing and exploding databases suggested a different strategy: learn the rules from data instead of writing them by hand. Statistical learning theory, honed for decades in pattern recognition, finally met the data required to realise it. As a result, the 1990s saw a decisive paradigm shift from “Knowledge is Power” to “Data is Power.” That shift—our focus here—set the stage for the deep-learning renaissance of the 2010s.


2 From Hand-Crafted Rules to Learned Models

2.1 Paradigm comparison

CharacteristicRule-Based Expert SystemMachine-Learning Model
Knowledge sourceHuman domain expertsEmpirical data
ScalabilityLinear in number of rulesImproves with more data
Handling noise/uncertaintyPoorBuilt-in probabilistic tolerance
Maintenance costHigh (manual updates)Retrain or fine-tune

Manual knowledge engineering faltered once domains grew too complex: DEC’s XCON needed ~10,000 rules and a dedicated upkeep team. By contrast, algorithms such as ID3 could ingest thousands of labelled examples and yield a decision policy automatically ([link.springer.com][1]).

Key ML advantages

  • Scalability – bigger corpora improved accuracy rather than overwhelming authors.
  • Robustness – probabilistic models degrade gracefully on edge cases.
  • Automatic feature discovery – algorithms uncover patterns humans overlook.
    flowchart LR
    A[Domain Experts] -->|Encode| R(Rule Base)
    R -->|Inference| O[Outputs]

    subgraph ML_Workflow
        D[Raw Data] --> P[Pre-processing]
        P --> T[Train Algorithm]
        T --> M[Model]
        M --> O2[Outputs]
    end

    classDef manual fill:#f8d7da;
    class R,A manual;

Rule vs ML workflow (Flowchart contrasting manual rule entry with automated training loop”)

Tiny code taste

# Rule: if temp > 37.5°C then "fever"
def rule_based(temp): 
    return "fever" if temp > 37.5 else "normal"

# Learned logistic-regression model
from sklearn.linear_model import LogisticRegression
import numpy as np
X = np.array([[36.8],[38.2],[37.0],[39.1]])
y = np.array([0,1,0,1])          # 1 = fever
clf = LogisticRegression().fit(X,y)
print(clf.predict([[37.6]]))      # → array([1])

3 Early ML Algorithms & Successes

3.1 Decision Trees (ID3 → C4.5)

ID3 introduced entropy-based node splitting [1]. C4.5 generalised it to handle continuous features and pruning [2].

from sklearn import tree, datasets
dt = tree.DecisionTreeClassifier(criterion="entropy", max_depth=3)
iris = datasets.load_iris()
dt.fit(iris.data, iris.target)
tree.plot_tree(dt)   # visualises splits
    graph TD
    S1["sepal_len ≤ 5.45?"] -->|yes| C1[Leaf: setosa]
    S1 -->|no| S2["petal_len ≤ 2.45?"]
    S2 -->|yes| C2[Leaf: versicolor]
    S2 -->|no| C3[Leaf: virginica]

Example decision tree (Toy tree splitting on sepal length/width)

3.2 Bayesian Networks

Pearl’s 1988 treatise revived probabilistic reasoning ([amazon.com][3]). By the mid-1990s, BN-powered diagnostic tools predicted liver disorders with clinically useful accuracy ([citeseerx.ist.psu.edu][4], [cs.ru.nl][5]).

3.3 Support Vector Machines (SVM)

Cortes & Vapnik’s 1995 paper formalised margin maximisation ([link.springer.com][6]). The kernel trick let linear algebra solve non-linear problems in high-dimensional feature spaces.

from sklearn.svm import SVC
svc = SVC(kernel='rbf', C=1.0, gamma='scale')
svc.fit(X, y)

3.4 IBM Deep Blue (1997)

Deep Blue’s 32-node RS/6000 SP supercomputer evaluated 200 M positions/s [3]. After losing Game 1, it defeated Kasparov 3½-2½, a watershed media moment for AI [4].

Mathematical insight: The margin in SVM is the distance between the decision boundary and the nearest data points (support vectors). Maximizing this margin improves generalization to unseen data.

    graph TD
    subgraph "SVM Classification"
        A["● Class +1"] 
        B["○ Class -1"]
        C[Decision Boundary]
        D["Support Vectors"]
        E["Maximum Margin"]
    end
    
    A -.-> D
    B -.-> D
    D --> C
    C --> E

SVM concept diagram (Shows class separation with maximum margin decision boundary)


4 AI in the 90s – Real-World Applications

  • Credit scoring – Neural nets cut default rates in US credit-union data ([sciencedirect.com][9]). Indian banks began pilot scoring systems late-decade ([researchgate.net][10]).
  • Market-basket analysis – Agrawal & Srikant’s 1994 Apriori algorithm extracted shopping patterns an order of magnitude faster than predecessors ([vldb.org][11], [ibm.com][12]).
  • Customer segmentation – Decision-tree ensembles boosted telco churn prediction accuracy.
  • Web search – AltaVista’s crawler fuelled TF-IDF ranking; Google’s PageRank (1998) soon leveraged link structure.
  • Recommenders – Amazon (1998) deployed item-to-item collaborative filtering, an association-rule cousin.
    graph LR
    FIN[Finance] -- SVM / NNs --> CREDIT[Risk Scoring]
    RET[Retail] -- Apriori --> BASKET[Association Rules]
    WEB[Web] -- Crawlers --> SEARCH[Search Engines]
    TEL[Telecom] -- Trees --> CHURN[Churn Prediction]

1990’s AI ecosystem (Nodes for Finance, Retail, Web, Telecom connected to ML methods)


5 Indian Context

5.1 IT-services boom

India’s software exports rocketed from $175 M in 1990 to $8.7 B by 2000—>50 % CAGR ([faculty.washington.edu][13]). Bangalore earned “Silicon Valley of India” status ([wired.com][14]).

5.2 Early AI adoption

  • Banking – ICICI experimented with neural-network loan risk models.
  • Agriculture – prototype decision support systems helped optimise irrigation and pest control [5].
  • Education – IITs and IISc rolled out elective ML courses by 1998.

5.3 Odisha spotlight

The Software Technology Park of India (STPI), Bhubaneswar opened in 1990, creating a data-link hub and incubation programmes that later hosted regional AI startups [6].

    timeline
    title India’s Tech Evolution 1990-1999
    1990 : STPI Bhubaneswar founded
    1991 : Economic Liberalisation
    1993 : VSNL brings public Internet
    1995 : NASSCOM push on software exports
    1998 : IT Act drafted

India’s tech-ecosystem timeline (1991 Liberalisation → 1993 Internet → 1998 IT Act etc.)


6 Hands-On Demo Section

All three tutorials are available as Colab notebooks; click “Open in Colab,” run, and experiment with the sliders.

TutorialColab LinkKey Concepts
Decision Tree Classifier<https://colab.research.google.com/drive/>Entropy, pruning, decision boundaries
Naïve Bayes Text Spam Filter<https://colab.research.google.com/drive/>Bag-of-words, Laplace smoothing
Support Vector Machine<https://colab.research.google.com/drive/>Kernels, margin, cross-validation

6.1 Decision Tree – Iris demo

!pip install scikit-learn==1.5 pandas matplotlib ipywidgets -q
from sklearn import tree, datasets
from ipywidgets import interact, IntSlider
iris = datasets.load_iris()
X, y = iris.data[:, :2], iris.target        # sepal dims only

def train(max_depth=3):
    clf = tree.DecisionTreeClassifier(max_depth=max_depth, criterion="entropy")
    clf.fit(X, y)
    print(f"Depth {max_depth} accuracy:", clf.score(X, y))
interact(train, max_depth=IntSlider(1,1,10));

Extension: try max_depth=None and note over-fitting warning.

6.2 Naïve Bayes – Spam detection

from sklearn.datasets import fetch_openml
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
emails = fetch_openml("spam_base", version=1, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
        emails.data['text'], emails.target, test_size=0.2, random_state=42)
model = Pipeline([("vec", CountVectorizer(stop_words='english')),
                  ("nb", MultinomialNB(alpha=1.0))])
model.fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test)))

Try This: adjust alpha with a slider (FloatSlider) and watch precision-recall shift.

6.3 SVM – Kernel playground

from sklearn import svm, datasets
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt
X, y = datasets.make_moons(noise=0.3, random_state=0)
def plot(kernel='rbf', C=1.0):
    clf = svm.SVC(kernel=kernel, C=C).fit(X, y)
    plot_decision_regions(X, y, clf=clf)
    plt.title(f"SVM boundary ({kernel}, C={C})")
interact(plot, kernel=['linear','rbf','poly'], C=(0.1,10,0.1));

Common Pitfall: Large C overfits; observe jagged boundaries.


Learning Checkpoints

Concept Check 1: Why does entropy guide decision-tree splits better than simple accuracy? Concept Check 2: How does the kernel trick avoid computing in infinite-dimensional space?

Common Pitfall: Treating Naïve Bayes independence assumption as gospel—watch for correlated features.


Assessment

  1. Quiz: What property of SVMs maximises generalisation?
  2. Coding challenge: Replace Iris with Wine dataset and repeat Tutorial 1.
  3. Case study prompt: Argue whether Deep Blue was really “AI.” Support with 1990s definitions.

Solutions are included at the bottom of each Colab notebook.


Preparing for Part 6

Statistical learning solved many 1990s problems, yet hand-crafted features were still king. Next time we’ll see how representation learning and neural networks staged a comeback, giving birth to deep learning.



Happy learning – see you in Part 6!


📝 Series Navigation


  1. https://link.springer.com/article/10.1007/BF00116251?utm_source=odishaai.org “Induction of decision trees | Machine Learning”

  2. https://link.springer.com/article/10.1007/BF00993309?utm_source=odishaai.org “C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan …”

  3. https://www.ibm.com/history/deep-blue?utm_source=odishaai.org “Deep Blue - IBM”

  4. https://www.wired.com/2011/05/0511ibm-deep-blue-beats-chess-champ-kasparov?utm_source=odishaai.org “May 11, 1997: Machine Bests Man in Tournament-Level Chess Match”

  5. https://www.researchgate.net/publication/221916044_Decision_Support_Systems_in_Agriculture_Some_Successes_and_a_Bright_Future?utm_source=odishaai.org “Decision Support Systems in Agriculture: Some Successes and a …”

  6. https://bhubaneswar.stpi.in/en?utm_source=odishaai.org “STPI - Bhubaneswar - Software Technology Park of India”