Neural Network Architecture Selection Cheat Sheet

The Core Principle

Architecture = Inductive Bias. Match your architecture's assumptions to your data's structure.

Data Structure	Architecture Assumes	Use	Watch For
Grid/Spatial (images)	Nearby elements correlate	CNN	Misses global context
Sequential (text, time)	Order matters	Transformer/RNN	Cost/latency explosion
Relational (networks)	Explicit relationships	GNN	Graph construction errors
Tabular (spreadsheets)	Minimal structure	Trees → MLP	High data requirements
Multiple types	Separate encoders needed	Multimodal	Complexity without gain

Quick Decision by Modality

Vision

Limited data / edge deployment → CNN (ResNet, EfficientNet)
Large data + pretrained available → ViT
Zero-shot / retrieval / similarity → CLIP-style embeddings

Text

Understand / classify / extract → Encoder (BERT-style)
Generate / complete → Decoder (GPT-style)
Transform (translate, summarize) → Encoder-Decoder (T5)
Large knowledge base → RAG (retrieval + generation)

Time Series

Streaming + low latency → GRU/LSTM or TCN
Batch OK + long context → Transformer or SSM

Graph

One node/edge type → Standard GNN (GCN, GAT)
Multiple types → Heterogeneous GNN
Millions of nodes → GraphSAGE with sampling

Multimodal

Fast retrieval/matching → Dual-encoder (CLIP-style)
Deep grounding (VQA) → Cross-attention
Loosely coupled signals → Late fusion

Data Quantity Rules of Thumb

Samples	Approach
< 1,000	Classical ML. Heavy transfer learning if neural.
1,000–10,000	Transfer learning essential. Fine-tune pretrained.
10,000–100,000	Most architectures viable with pretrained start.
100,000+	Training from scratch becomes reasonable.

The Five Principles

Simple First — Try logistic regression or gradient boosting before neural networks.
Transfer Learning Default — Never train from scratch if pretrained weights exist.
Data Over Architecture — The best architecture can't fix bad data. Spend 80% on data quality.
Match Inductive Bias — Choose architectures whose assumptions match your data's true structure.
Production Reality — Consider latency, memory, and monitoring from the start.

Before You Start: Model Brief

Answer these before choosing:

Input: Fixed or variable size? Local or global signal?

Output: Label, sequence, mask, ranking, or generation?

Constraints: Latency requirement? Memory budget? Throughput needs?

Risk: Explainability required? False negative vs false positive tolerance?

Common Mistakes

Mistake	Reality
"Transformers are always best"	Wasteful for tabular, small vision, edge
Ignoring classical ML for tabular	XGBoost/LightGBM often wins
Training from scratch	Fine-tuning needs 100x less data
Deeper = better	Diminishing returns, overfitting risk
Adding modalities "because it might help"	Complexity without signal = noise

Production Checklist

☐ Model versioned for rollback?
☐ Input validation on API?
☐ Cold start time acceptable?
☐ Fallback if model fails?
☐ Monitoring for drift?
☐ Can recreate from scratch if needed?

Optimization Options

Technique	Result
Quantization (float32 → int8)	4x smaller, 2-4x faster
Pruning	Remove near-zero weights
Distillation	Small student mimics large teacher

Full Guide

For comprehensive coverage with worked examples and deep-dives into each architecture family:

The Practitioner's Guide to Choosing Neural Network Architectures

perfecXion.ai

About the Leading AI Security Expert