Sinir Ağları ve Modern Yapay Zeka
Yapay sinir ağları (Artificial Neural Networks), 2010'lardan bu yana yapay zeka alanında devrimsel ilerlemenin omurgasını oluşturuyor. Görüntü tanıma, doğal dil işleme, oyun (AlphaGo), tıbbi teşhis ve self-driving araçlarda büyük başarı bu modellere dayanır. AIOR olarak müşteri projelerimizde sinir ağlarını anomali tespiti, predictive maintenance, OCR pipeline'ları ve müşteri segmentasyon görevlerinde kullanıyoruz.Bu rehber sinir ağı modellemenin temel yöntemlerini pratik açıdan ele alır.
Temel Mimari Türleri
Sinir ağları görev tipine göre farklı mimariler kullanır:Multi-Layer Perceptron (MLP): tabular veri (Excel-like) için klasik feedforward ağ. 3-7 katman, her katmanda yüzlerce nöron. Müşteri churn tahmini, kredi skorlama gibi yapısal veri görevlerinde temel araç.
Convolutional Neural Network (CNN): görüntü ve uzamsal veri için. Konvolüsyon filtreleri ile özellik haritası çıkarır. ResNet, VGG, EfficientNet popüler mimarilerdir. Sanayi anomali tespiti, tıbbi görüntüleme, OCR'da kullanılır.
Recurrent Neural Network (RNN/LSTM/GRU): zaman serisi ve sıralı veri için. Önceki adımların hafızasını taşır. Borsa tahmini, doğal dil üretimi, makine çevirisinde önemliydi; günümüzde büyük ölçüde Transformer'a yerini bıraktı.
Transformer: attention mekanizmasıyla sıralı veriyi paralel işler. BERT, GPT, T5 gibi modellerin temeli. LLM uygulamalarının kalbi.
Autoencoder: kendi girdisini yeniden üretmeyi öğrenir; orta katmanda özellik temsili çıkar. Anomali tespiti, denoising, dimensionality reduction için.
Veri Hazırlığı
Sinir ağları veri kalitesine son derece hassastır. Hazırlık adımları:
Code:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Veri yükle
df = pd.read_csv("data.csv")
# Eksik değer doldurma
df = df.fillna(df.mean(numeric_only=True))
# Outlier handling
for col in numeric_cols:
q1, q3 = df[col].quantile([0.01, 0.99])
df[col] = df[col].clip(q1, q3)
# Feature scaling — sinir ağları için kritik
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[features])
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, df[target], test_size=0.2, random_state=42, stratify=df[target]
)
Önemli: feature scaling olmadan sinir ağları yavaş eğitilir veya hiç converge etmez. Mutlaka standardize edin.
Basit MLP Modeli (Keras)
TensorFlow/Keras ile başlangıç seviyesi bir model:
Code:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
model = Sequential([
Dense(128, activation='relu', input_shape=(X_train.shape[1],)),
BatchNormalization(),
Dropout(0.3),
Dense(64, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid') # Binary classification
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy', 'AUC']
)
model.summary()
Eğitim — Callback'lerle
Code:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
callbacks = [
EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6),
ModelCheckpoint('best_model.h5', save_best_only=True, monitor='val_AUC', mode='max')
]
history = model.fit(
X_train, y_train,
validation_split=0.2,
epochs=100,
batch_size=64,
callbacks=callbacks,
verbose=1
)
EarlyStopping overfitting'i engeller, ReduceLROnPlateau plateau'da learning rate'i düşürür, ModelCheckpoint en iyi ağırlıkları kaydeder.
Aşırı Öğrenme (Overfitting) ile Mücadele
Sinir ağları rahatlıkla overfit eder. Etkili yöntemler:Dropout: eğitim sırasında rastgele nöronları devre dışı bırakır; regularization etkisi yaratır.
Batch Normalization: her katman çıktısını normalize eder; eğitimi stabilize eder ve daha hızlı converge sağlar.
L1/L2 regularization: büyük ağırlıkları cezalandırır:
Code:
from tensorflow.keras.regularizers import l2
Dense(64, kernel_regularizer=l2(0.001))
Data augmentation: görüntü verisinde rotation, flip, crop; tabular veride SMOTE/oversampling.
Early stopping: validation loss artmaya başladığında dur.
Hyperparameter Tuning
Hangi mimari, kaç katman, batch size, learning rate? Bunu manuel bulmak imkansızdır:
Code:
from keras_tuner import RandomSearch
def build_model(hp):
model = Sequential()
for i in range(hp.Int('num_layers', 2, 5)):
model.add(Dense(
hp.Int(f'units_{i}', 32, 256, step=32),
activation=hp.Choice('activation', ['relu', 'tanh'])
))
model.add(Dropout(hp.Float(f'dropout_{i}', 0.1, 0.5, step=0.1)))
model.add(Dense(1, activation='sigmoid'))
model.compile(
optimizer=hp.Choice('optimizer', ['adam', 'rmsprop']),
loss='binary_crossentropy', metrics=['AUC']
)
return model
tuner = RandomSearch(build_model, objective='val_AUC', max_trials=20)
tuner.search(X_train, y_train, validation_split=0.2, epochs=50)
Model Yorumlanabilirliği
Sinir ağları "kara kutu" olarak ünlenmiştir ama yorumlanabilirlik araçları var:SHAP (SHapley Additive exPlanations): her tahminin hangi özellikten ne kadar etkilendiğini gösterir.
LIME: yerel açıklama, tek bir tahmini decision-tree-like bir modelle yaklaşır.
Permutation importance: her özelliği randomize edip modelin performans kaybını ölç.
İş kararı için kullanılan sinir ağlarında yorumlanabilirlik şart — düzenleyici uyum (KVKK, GDPR Article 22) "explainable AI" gerektirir.
Production Deployment
Eğitim tamamlanınca model üretime alınır:- TensorFlow Serving: gRPC + REST endpoint
- ONNX: framework-agnostic, edge deployment için
- NVIDIA Triton: GPU inferans, yüksek throughput
- Flask/FastAPI + saved model: basit cases için
AIOR Sinir Ağı Hizmetleri
AIOR olarak müşteri projelerinde sinir ağı geliştirme + deployment dahil end-to-end pipeline kuruyoruz. Hosting paketlerimizde TensorFlow, PyTorch, Keras, scikit-learn önceden kurulu; GPU dedicated paketlerimizde CUDA + cuDNN hazır. Anomali tespiti, predictive maintenance, OCR ve customer segmentation projelerine danışmanlık veriyoruz.Neural Networks and Modern AI
Artificial Neural Networks (ANNs) have been the backbone of AI's revolutionary progress since the 2010s. Image recognition, natural language processing, gaming (AlphaGo), medical diagnostics and self-driving vehicles all depend on these models. At AIOR we use neural networks across customer projects for anomaly detection, predictive maintenance, OCR pipelines and customer segmentation.This guide covers the foundational methods of neural network modelling from a practical angle.
Core Architecture Types
Neural networks use different architectures by task type:Multi-Layer Perceptron (MLP): classic feedforward network for tabular data. 3-7 layers, hundreds of neurons per layer. The bread-and-butter for structured data tasks like churn prediction and credit scoring.
Convolutional Neural Network (CNN): for images and spatial data. Convolutional filters extract feature maps. ResNet, VGG and EfficientNet are popular architectures. Used in industrial anomaly detection, medical imaging and OCR.
Recurrent Neural Network (RNN/LSTM/GRU): for time-series and sequential data. Carries memory across time steps. Was important for stock prediction, language generation and machine translation; mostly replaced by Transformers now.
Transformer: processes sequence data in parallel via the attention mechanism. The foundation of BERT, GPT, T5. The heart of modern LLM applications.
Autoencoder: learns to reproduce its own input; the bottleneck layer surfaces a feature representation. Used for anomaly detection, denoising and dimensionality reduction.
Data Preparation
Neural networks are extremely sensitive to data quality. Preparation steps:
Code:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Load data
df = pd.read_csv("data.csv")
# Impute missing values
df = df.fillna(df.mean(numeric_only=True))
# Outlier handling
for col in numeric_cols:
q1, q3 = df[col].quantile([0.01, 0.99])
df[col] = df[col].clip(q1, q3)
# Feature scaling — critical for neural nets
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[features])
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, df[target], test_size=0.2, random_state=42, stratify=df[target]
)
Important: without feature scaling, neural networks train slowly or fail to converge. Always standardise.
A Simple MLP Model (Keras)
A starter-level model in TensorFlow/Keras:
Code:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
model = Sequential([
Dense(128, activation='relu', input_shape=(X_train.shape[1],)),
BatchNormalization(),
Dropout(0.3),
Dense(64, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(32, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid') # Binary classification
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy', 'AUC']
)
model.summary()
Training with Callbacks
Code:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
callbacks = [
EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6),
ModelCheckpoint('best_model.h5', save_best_only=True, monitor='val_AUC', mode='max')
]
history = model.fit(
X_train, y_train,
validation_split=0.2,
epochs=100,
batch_size=64,
callbacks=callbacks,
verbose=1
)
EarlyStopping prevents overfitting; ReduceLROnPlateau drops learning rate when stuck; ModelCheckpoint saves the best weights.
Fighting Overfitting
Neural networks overfit easily. Effective remedies:Dropout: randomly disables neurons during training; acts as regularisation.
Batch Normalization: normalises each layer's output; stabilises training and accelerates convergence.
L1/L2 regularisation: penalises large weights:
Code:
from tensorflow.keras.regularizers import l2
Dense(64, kernel_regularizer=l2(0.001))
Data augmentation: rotation/flip/crop for images; SMOTE/oversampling for tabular.
Early stopping: halt when validation loss starts rising.
Hyperparameter Tuning
Architecture choice, number of layers, batch size, learning rate — impossible to find manually:
Code:
from keras_tuner import RandomSearch
def build_model(hp):
model = Sequential()
for i in range(hp.Int('num_layers', 2, 5)):
model.add(Dense(
hp.Int(f'units_{i}', 32, 256, step=32),
activation=hp.Choice('activation', ['relu', 'tanh'])
))
model.add(Dropout(hp.Float(f'dropout_{i}', 0.1, 0.5, step=0.1)))
model.add(Dense(1, activation='sigmoid'))
model.compile(
optimizer=hp.Choice('optimizer', ['adam', 'rmsprop']),
loss='binary_crossentropy', metrics=['AUC']
)
return model
tuner = RandomSearch(build_model, objective='val_AUC', max_trials=20)
tuner.search(X_train, y_train, validation_split=0.2, epochs=50)
Model Interpretability
Neural networks are famously "black boxes" but interpretability tools exist:SHAP (SHapley Additive exPlanations): shows how each feature contributed to each prediction.
LIME: local explanation that approximates a single prediction with a decision-tree-like model.
Permutation importance: randomise each feature and measure performance drop.
For business-decision use cases, interpretability is mandatory — regulatory compliance (KVKK, GDPR Article 22) requires "explainable AI".
Production Deployment
Once training is done, the model goes to production:- TensorFlow Serving: gRPC + REST endpoint
- ONNX: framework-agnostic, ideal for edge deployment
- NVIDIA Triton: GPU inference, high throughput
- Flask/FastAPI + saved model: for simple cases