AI localy

docker exec ollama ollama run qwen2.5-coder:3b “Hello”

Enable thinking: ollama run deepseek-r1 –think “prompt”.

Disable thinking: ollama run deepseek-r1 –think=false (useful for faster, direct answers).

Hide the trace: ollama run deepseek-r1 –hidethinking (uses thinking internally but only shows the final answer).

Interactive Toggle: While in a chat session, type /set think or /set nothink.

AI in DB

https://www.db.com/what-we-do/focus-topics/tech/index?language_id=1

Software coding assistants such as Google Gemini Code Assist and GitHub Copilot are used by over 6,000 developers to enhance productivity. These tools contribute to code generation and are especially useful for repetitive tasks, refactoring, test generation, and script writing, helping developers save 1.5 to 2.5 hours weekly. They also support learning by aiding navigation through unfamiliar codebases and legacy systems.

Headline

[https://www.reuters.com/business/retail-consumer/amazons-cloud-unit-hit-by-least-two-outages-involving-ai-tools-ft-says-2026-02-20/|Amzon Outage]

Headline

Matt Shumer

https://fortune.com/2026/02/11/something-big-is-happening-ai-february-2020-moment-matt-shumer/

https://youtu.be/s96JeuuwLzc

https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research

https://claude.ai/login?returnTo=%2F%3F

Hier ist ein deutlich besserer Deal.

Galaxy AI bietet Dir ChatGPT, Claude, Mistral, Llama, Perplexity, Gemini, Grok, Flux, Midjourney, Stable Diffusion und vieles mehr für zusammen nur 15 Dollar im Monat.

Run ollama

Anthropic just became budget-friendly. Just days ago, you can run the Claude Code tool for $0 cost using Ollama and open-source models. Here is how to do it:

Until now, using Anthropic's agentic coding tool meant paying for every token via their API. That changed this week.

Get it running with this simple 5-step guide:

𝟭 𝗜𝗻𝘀𝘁𝗮𝗹𝗹 𝗢𝗹𝗹𝗮𝗺𝗮

𝟮 𝗣𝘂𝗹𝗹 𝗮 𝘀𝘁𝗿𝗼𝗻𝗴 𝗼𝗽𝗲𝗻-𝘀𝗼𝗿𝘂𝗰𝗲 𝗰𝗼𝗱𝗶𝗻𝗴 𝗺𝗼𝗱𝗲𝗹

ollama pull qwen2.5-coder 3.

𝟯 𝗜𝗻𝘀𝘁𝗮𝗹𝗹 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲

option 1 curl -fsSL https : claude•ai/install•sh | bash option 2 npm install -g @anthropic-ai/claude-code 4. 𝟰 𝗖𝗼𝗻𝗻𝗲𝗰𝘁 Point Claude Code to your local server instead of Anthropic's cloud: export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=http://localhost:11434 𝟱 𝗥𝘂𝗻 𝗜𝘁 claude –model qwen2.5-coder The barrier to entry for agentic coding just dropped to zero. 𖤂 Repost to help your network stop burning cash on tokens. ====== Build AI video ====== I wrote the script and storyline with ChatGPT, generated the scene images with Google Gemini 3.0, created the video clips with Kling AI 2.6 and Higgsfield AI, then pulled everything together in CapCut with edits, subtitles and music.Different worlds, different transitions, all stitched into a smooth 30-second story. ====== AI components ====== The Pi 5 setup I have is about $700 new, and could be down to $300-400 if you use a used graphics card or one you already own. Here's my exact setup (some links are affiliate links): Raspberry Pi 5 8GB ($80) Raspberry Pi 27W Power Supply ($14) 1TB USB SSD ($64) Pineboards HatDrive! Bottom ($20) JMT M.2 Key to PCIe eGPU Dock ($55) OCuLink cable ($20) Lian-Li SFX 750W PSU ($130) AMD RX 6700 XT ($400) https://de.aliexpress.com/item/1005002802776587.html?spm=a2g0o.productlist.main.1.7b995f57azGpoe&algo_pvid=832b082f-f622-4687-8adf-3c48fef960c7&algo_exp_id=832b082f-f622-4687-8adf-3c48fef960c7-0&pdp_ext_f=%7B%22order%22%3A%22611%22%2C%22eval%22%3A%221%22%2C%22fromPage%22%3A%22search%22%7D&pdp_npi=6%40dis%21EUR%21217.98%21102.99%21%21%21251.51%21118.84%21%40211b619a17581952204622458e99c5%2112000022259296715%21sea%21DE%210%21ABX%211%210%21n_tag%3A-29910%3Bd%3A4bf434c9%3Bm03_new_user%3A-29895%3BpisId%3A5000000174645366&curPageLogUid=HJcKq2XQNmSS&utparam-url=scene%3Asearch%7Cquery_from%3A%7Cx_object_id%3A1005002802776587%7C_p_origin_prod%3A ====== Selfhosted AI ====== https://www.youtube.com/watch?v=9hni6rLfMTg https://www.jeffgeerling.com/blog/2024/llms-accelerated-egpu-on-raspberry-pi-5 ===== Quick take ===== * CPU-only works, but a recent 4-core (or better) x86-64 chip with AVX-512 and 16 GB RAM is the real-world floor for snappy inference with small (3–7 B) models. * For comfortable daily use—and to load 13 B models—plan on 32 GB system RAM and an SSD with ≥ 50 GB free (model files add up fast). * GPU is optional for inference but slashes latency. * NVIDIA: any CUDA-capable card with ≥ 8 GB VRAM (RTX 3060/4060 or newer) can accelerate 7–13 B models; bigger models scale roughly *1 GB VRAM ≈ 1 B parameters* when quantised. * AMD: ROCm preview support landed in 2024—modern RDNA 3 or Instinct cards work; older GPUs may need environment overrides. * Model RAM rules of thumb (inference, 4-bit quantised): === Model-size vs RAM === ^ Model (4-bit) ^ RAM required ^ | 7 B | 8 GB | | 13 B | 16 GB | | 70 B | 64 GB | ===== 1. Baseline self-hosted inference stack ===== ==== CPU, RAM & storage ==== ^ Component ^ Minimum (7 B) ^ Comfortable (13 B) ^ Large-model (30–70 B) ^ | CPU | 4 cores @ 3 GHz | 8 cores | 16 cores / EPYC-class | | System RAM | 16 GB | 32 GB | 64 – 128 GB | | Disk (SSD) | 50 GB free | 100 GB+ | 256 GB+ | ==== GPU acceleration (optional) ==== * NVIDIA (CUDA ≥ 11.8) * 8 GB VRAM → 7 B models * 12 GB VRAM → 13 B models * 24 GB+ VRAM → 30 B+ models or half-precision fine-tuning * Driver & toolkit versions must match the runtime container. * AMD (ROCm 6.x preview) * Works on RX 7000-series and Instinct MI cards. * For unsupported GPUs you can set `HSA_OVERRIDE_GFX_VERSION` to bypass checks (with caveats). ==== Operating systems ==== * Linux (Ubuntu 22.04 LTS+) – best-supported. * macOS (Apple Silicon) – runs great on CPU/GPU but no training acceleration. * Windows 11 – requires WSL 2 for CUDA/ROCm access. ===== 2. Making the model learn ===== 1. Fine-tune with a standard framework – Hugging Face Transformers, PEFT/LoRA, or QLoRA scripts. 2. Resource planning – see table below. 3. Load the tuned model back into Ollama: <code bash> # after converting to gguf or building a Modelfile ollama create my-custom < Modelfile ollama run my-custom </code> ==== Fine-tuning options compared ==== ^ Fine-tuning flavour ^ Typical extra requirements ^ What that means in practice | | LoRA / QLoRA | ~ 8 GB GPU VRAM for a 7 B model | Can be done on a single consumer card; training fits in 16 – 32 GB system RAM. | | Full fine-tune | ≈ 16 GB GPU VRAM per 1 B parameters | Even a 7 B full fine-tune needs 40 – 60 GB VRAM; usually split across multiple GPUs or done in the cloud. | ===== 3. Cost-savvy build recommendations ===== ^ Budget ^ Suggested rig ^ Why | | Entry (≤ €600) | Used desktop i7-8700, 32 GB RAM, NVMe SSD | Great for 7 B inference CPU-only; tinkering with 1 – 3 B LoRA fine-tunes. | | Mid-range (~ €1200) | Ryzen 7 7800X3D, 32 GB DDR5, RTX 4060 12 GB | Smooth 13 B inference on GPU; LoRA fine-tunes of 7 B models. | | Pro (~ €3000+) | Threadripper 7955WX, 128 GB RAM, dual RTX 4090 24 GB or MI300X | Handles 70 B inference and LoRA on 30 B; dabble in full fine-tunes. | ===== 4. Final tips ===== * Quantise smartly – 4-bit GGUF cuts memory in half with minimal quality loss. * Keep RAM ≥ 1.5 × VRAM – avoids swapping during large context windows. * Use SSDs – model paging from spinning disks kills latency. * Profile early – `ollama benchmark` and `nvidia-smi`/`rocm-smi` reveal real bottlenecks. * Prototype in the cloud for full training runs, then serve locally to stay private.

dokuwiki

Table of Contents

AI localy

AI in DB

Headline

Headline

Run ollama