$ timeahead.in
← back
$ articles --tag llama

#llama

100 articles

01
Book publishers sue Meta over AI’s ‘word-for-word’ copying
Meta is facing a class action lawsuit filed by five major book publishers and one author over claims the company “engage…
The Verge AIModel#llama#training
41d
02
Extract PDF text in your browser with LiteParse for the web
Extract PDF text in your browser with LiteParse for the web 23rd April 2026 LlamaIndex have a most excellent open source…
Simon Willison BlogFrameworks#llama#llamaindex#open-source
53d
03
Ollama is now powered by MLX on Apple Silicon in preview March 30, 2026 Today, we're previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple's machine learning framework.
Ollama is now powered by MLX on Apple Silicon in preview March 30, 2026 Today, we’re previewing the fastest way to run O…
Ollama BlogHardware#llama
77d
04
The simplest and fastest way to setup OpenClaw February 23, 2026 Setup OpenClaw in under two minutes with a single Ollama command.
The simplest and fastest way to setup OpenClaw February 23, 2026 OpenClaw is a personal AI assistant that can clear your…
Ollama BlogModel#llama
112d
05
GGML and llama.cpp join HF to ensure the long-term progress of Local AI
GGML and llama.cpp join HF to ensure the long-term progress of Local AI Georgi Gerganov and team are joining HF with the…
Hugging Face BlogModel#llama#local
115d
06
Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code.
Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code. N…
Ollama BlogModel#llama#claude#coding
119d
07
Accelerating Long-Context Model Training in JAX and XLA
Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128…
NVIDIA Developer BlogModel#llama#training#gpu
132d
08
OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that connects your messaging apps to local AI coding agents, all running on your own device.
OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that bridges your favorite messaging platforms to AI codin…
Ollama BlogModel#llama#coding#local
134d
09
ollama launch January 23, 2026 ollama launch is a new command which sets up and runs coding tools like Claude Code, OpenCode, and Codex with local or cloud models. No environment variables or config files needed.
ollama launch January 23, 2026 ollama launch is a new command which sets up and runs your favorite coding tools like Cla…
Ollama BlogModel#llama#claude#coding
143d
10
Image generation (experimental) January 20, 2026 Generate images locally with Ollama on macOS. Windows and Linux support coming soon.
Image generation (experimental) January 20, 2026 Ollama now supports image generation on macOS, with Windows and Linux c…
Ollama BlogResearch#llama#multimodal
146d
11
Claude Code with Anthropic API compatibility January 16, 2026 Ollama is now compatible with the Anthropic Messages API, making it possible to use tools like Claude Code with open models.
Claude Code with Anthropic API compatibility January 16, 2026 Ollama v0.14.0 and later are now compatible with the Anthr…
Ollama BlogModel#llama#claude#coding
150d
12
OpenAI Codex with Ollama January 15, 2026 Open models can be used with OpenAI's Codex CLI through Ollama. Codex can read, modify, and execute code in your working directory using models such as gpt-oss:20b, gpt-oss:120b, or other open-weight alternatives.
OpenAI Codex with Ollama January 15, 2026 Open models can be used with OpenAI’s Codex CLI through Ollama. Codex can read…
Ollama BlogOpen Source#llama#coding
151d
13
Import AI 439: AI kernels; decentralized training; and universal representations
Import AI 439: AI kernels; decentralized training; and universal representations How might a hypothetical superintellige…
Import AI (Jack Clark)Research#llama#claude#inference
161d
14
New in llama.cpp: Model Management
New in llama.cpp: Model Management Reminder: llama.cpp server is a lightweight, OpenAI-compatible HTTP server for runnin…
Hugging Face BlogModel#llama
186d
15
OVHcloud on Hugging Face Inference Providers 🔥
OVHcloud on Hugging Face Inference Providers 🔥 We're thrilled to share that OVHcloud is now a supported Inference Provi…
Hugging Face BlogResearch#llama#qwen#fine-tuning
203d
16
OpenAI gpt-oss-safeguard October 29, 2025 Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) to bring the latest gpt-oss-safeguard reasoning models to users for safety classification tasks. gpt-oss-safeguard models are available in two sizes: 20B and 120B, and are permissively licensed under the Apache 2.0 license.
OpenAI gpt-oss-safeguard October 29, 2025 Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) t…
Ollama BlogOpen Source#llama#safety
229d
17
Building Instant RL Loops with Meta Llama Tools and Cerebras October 27, 2025
Oct 27 2025 Building Instant RL Loops with Meta Llama Tools and Cerebras In this post, we’ll show how to use two open-so…
Cerebras BlogTutorial#llama#inference#training
230d
18
MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama's cloud. It's a model built for coding and agentic workflows.
MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama’s cloud. It’s a model built for coding and agentic wor…
Ollama BlogAgents#llama#agents#coding
230d
19
Granite 4.0 Nano: Just how small can you go?
Granite 4.0 Nano: Just how small can you go? Today we are excited to share Granite 4.0 Nano, our smallest models yet, re…
Hugging Face BlogInfra#llama#inference#local
230d
20
NVIDIA DGX Spark performance October 23, 2025 We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs.
NVIDIA DGX Spark performance October 23, 2025 Performance We ran performance tests on release day firmware and an update…
Ollama BlogModel#llama#gpu
235d
21
New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service with easy integrations to the tools you are familiar with. Qwen3-Coder-30B has been updated for faster, more reliable tool calling in Ollama’s new engine.
New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service w…
Ollama BlogAgents#llama#qwen#coding
242d
22
10/15/2025 LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama
Modern AI apps rarely run on a single model forever. Teams iterate, swap providers, and increasingly run open-source mod…
Fireworks AI BlogInfra#llama#fine-tuning#inference
243d
23
Qwen3-VL October 14, 2025 Ollama now supports Alibaba's Qwen3-VL.
Qwen3-VL October 14, 2025 Qwen3-VL, the most powerful vision language model in the Qwen series is now available on Ollam…
Ollama BlogModel#llama#qwen
244d
24
NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box.
NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it run…
Ollama BlogModel#llama#gpu
245d
25
Web search September 24, 2025 A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud.
Web search September 24, 2025 A new web search API is now available in Ollama. Ollama provides a generous free tier of w…
Ollama BlogModel#llama
264d
26
New model scheduling September 23, 2025 Ollama now includes a significantly improved model scheduling system, reducing crashes due to out of memory issues, maximizing GPU utilization and performance, especially on multi-GPU systems.
New model scheduling September 23, 2025 Ollama now includes a significantly improved model scheduling system. Ahead of r…
Ollama BlogHardware#llama
265d
27
Arm & ExecuTorch 0.7: Bringing Generative AI to the masses
Arm & ExecuTorch 0.7: Bringing Generative AI to the masses With Arm’s recent SME2 announcement, the role of Arm KleidiAI…
Hugging Face BlogHardware#llama#coding#embeddings
306d
28
OpenAI gpt-oss August 5, 2025 Ollama partners with OpenAI to bring gpt-oss to Ollama and its community.
OpenAI gpt-oss August 5, 2025 Welcome OpenAI’s gpt-oss! Ollama partners with OpenAI to bring its latest state-of-the-art…
Ollama BlogOpen Source#llama
314d
29
Measuring Open-Source Llama Nemotron Models on DeepResearch Bench
Measuring Open-Source Llama Nemotron Models on DeepResearch Bench NVIDIA’s AI-Q Blueprint—the leading portable, open dee…
Hugging Face BlogResearch#llama#open-source
315d
30
Ollama's new app July 30, 2025 Ollama's new app is now available for macOS and Windows.
Ollama's new app July 30, 2025 Ollama’s new app is now available for macOS and Windows. An easier way to chat with model…
Ollama BlogModel#llama
320d
31
Ettin Suite: SoTA Paired Encoders and Decoders
Ettin Suite: SoTA Paired Encoders and Decoders TL;DR What would happen if you took the ModernBERT recipe and applied it …
Hugging Face BlogModel#llama#training
334d
32
SmolLM3: smol, multilingual, long-context reasoner
SmolLM3: smol, multilingual, long-context reasoner - Base model: https://hf.co/HuggingFaceTB/SmolLM3-3B-Base - Instruct …
Hugging Face BlogModel#llama#qwen#training
342d
33
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub
Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub TL;DR NVIDIA Llama Nemotron Nano VL is a state-of-the-art…
Hugging Face BlogModel#llama#gpu
353d
34
Transformers backend integration in SGLang
Transformers backend integration in SGLang But once you're ready to move from notebooks to production, inference perform…
Hugging Face BlogInfra#llama#inference
357d
35
Secure Minions: private collaboration between Ollama and frontier models June 3, 2025 Secure Minions is a secure protocol built by Stanford's Hazy Research lab to allow encrypted local-remote communication.
Secure Minions: private collaboration between Ollama and frontier models June 3, 2025 Three months ago, Stanford’s Hazy …
Ollama BlogResearch#llama
377d
36
Thinking May 30, 2025 Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.
Thinking May 30, 2025 Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choo…
Ollama BlogModel#llama
382d
37
Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time.
Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enabl…
Ollama BlogAgents#llama
384d
38
Dell Enterprise Hub is all you need to build AI on premises
Dell Enterprise Hub is all you need to build AI on premises Models Ready for Action If you go to the Dell Enterprise Hub…
Hugging Face BlogInfra#llama#training#gpu
388d
39
Ollama's new engine for multimodal models May 15, 2025 Ollama now supports new multimodal models with its new engine.
Ollama's new engine for multimodal models May 15, 2025 Ollama now supports multimodal models via Ollama’s new engine, st…
Ollama BlogInfra#llama#multimodal
396d
40
The Transformers Library: standardizing model definitions
The Transformers Library: standardizing model definitions Transformers was created in 2019, shortly following the releas…
Hugging Face BlogInfra#llama#qwen#rag
396d
41
Welcoming Llama Guard 4 on Hugging Face Hub
Welcoming Llama Guard 4 on Hugging Face Hub Table-of-Contents What is Llama Guard 4? Vision and large language models de…
Hugging Face BlogModel#llama
412d
42
Official Llama API Now Fastest via Groq Inference
Official Llama API Now Fastest via Groq Inference The official Llama API is now accelerated by Groq. Served on the world…
Groq BlogInfra#llama#inference
412d
43
4/28/2025 Optimizing Llama 4 Maverick on Fireworks AI
Meta's Llama 4 Maverick is their initial natively-multimodal, Mixture-of-Experts (MoE) model. This model processes both …
Fireworks AI BlogResearch#llama#fine-tuning#inference
413d
44
Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC
Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC As a preview of what you ca…
Hugging Face BlogTutorial#llama#multimodal#coding
432d
45
Llama 4 in vLLM Apr 5, 2025 · 4 min read We're excited to announce that vLLM now supports the Llama 4 herd of models: Scout (17B-16E) and Maverick (17B-128E). You can run these powerful long-context, natively multi-modal (up to 8-10...
Llama 4 in vLLM We're excited to announce that vLLM now supports the Llama 4 herd of models: Scout (17B-16E) and Maveric…
vLLM BlogInfra#llama#inference
436d
46
Welcome Llama 4 Maverick & Scout on Hugging Face
Welcome Llama 4 Maverick & Scout on Hugging Face Released today, these powerful, natively multimodal models represent a …
Hugging Face BlogModel#llama
436d
47
Llama 4 Inference Fast & Affordable – Now Live on GroqCloud
Llama 4 Inference Fast & Affordable – Now Live on GroqCloud Meta’s Llama 4 Scout and Maverick models are live today on G…
Groq BlogInfra#llama#inference
436d
48
Journey to 1 Million Gradio Users!
Journey to 1 Million Gradio Users! 5 years ago, we launched Gradio as a simple Python library to let researchers at Stan…
Hugging Face BlogResearch#llama#multimodal#coding
437d
49
Minions: where local and cloud LLMs meet February 25, 2025 Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christopher Ré's Stanford Hazy Research lab, along with Avner May, Scott Linderman, James Zou, have developed a way to shift a substantial portion of LLM workloads to consumer devices by having small on-device models (such as Llama 3.2 with Ollama) collaborate with larger models in the cloud (such as GPT-4o).
Minions: where local and cloud LLMs meet February 25, 2025 Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christ…
Ollama BlogResearch#llama#gpt#local
475d
50
Introducing vLLM Inference Provider in Llama Stack Jan 27, 2025 · 8 min read We are excited to announce that vLLM inference provider is now available in Llama Stack through the collaboration between the Red Hat AI Engineering team and the Llama Stack team from Meta. This...
Introducing vLLM Inference Provider in Llama Stack We are excited to announce that vLLM inference provider is now availa…
vLLM BlogInfra#llama#inference
504d
51
Mastering Long Contexts in LLMs with KVPress
Mastering Long Contexts in LLMs with KVPress KVPress packs the latest KV cache compression techniques, enabling memory-e…
Hugging Face BlogModel#llama
508d
52
Visual Document Retrieval Goes Multilingual
Visual Document Retrieval Goes Multilingual TL;DR: We presentvdr-2b-multi-v1 , the best multilingual embedding model for…
Hugging Face BlogFrameworks#llama#qwen#llamaindex
521d
53
Bamba: Inference-Efficient Hybrid Mamba2 Model
Bamba: Inference-Efficient Hybrid Mamba2 Model 🐍 TL;DR We introduce Bamba-9B, an inference-efficient Hybrid Mamba2 mode…
Hugging Face BlogResearch#llama#inference#observability
544d
54
Structured outputs December 6, 2024 Ollama now supports structured outputs making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs.
Structured outputs December 6, 2024 Ollama now supports structured outputs making it possible to constrain a model’s out…
Ollama BlogHardware#llama
556d
55
Rearchitecting Hugging Face Uploads and Downloads
Rearchitecting Hugging Face Uploads and Downloads - Uploads from 88 countries - 8.2 million upload requests - 130.8 TB o…
Hugging Face BlogInfra#llama#rag
566d
56
Ollama Python library 0.4 with function calling improvements November 25, 2024 With Ollama Python library version 0.4, functions can now be provided as tools. The library now also has full typing support and new examples have been added.
Ollama Python library 0.4 with function calling improvements November 25, 2024 In the latest version of the Ollama Pytho…
Ollama BlogAgents#llama
567d
57
You could have designed state of the art positional encoding
You could have designed state of the art positional encoding Gall's Law A complex system that works is invariably found …
Hugging Face BlogModel#llama#coding
567d
58
Llama 3.2 Vision November 6, 2024 Llama 3.2 Vision 11B and 90B models are now available in Ollama.
Llama 3.2 Vision November 6, 2024 Llama 3.2 Vision is now available to run in Ollama, in both 11B and 90B sizes. Get sta…
Ollama BlogModel#llama#multimodal
586d
59
Universal Assisted Generation: Faster Decoding with Any Assistant Model
Universal Assisted Generation: Faster Decoding with Any Assistant Model gemma-2-9b and Mixtral-8x22B-Instruct-v0.1 lack …
Hugging Face BlogOpen Source#llama#inference#coding
594d
60
Serving LLMs on AMD MI300X: Best Practices Oct 23, 2024 · 15 min read TL;DR: vLLM unlocks incredible performance on the AMD MI300X, achieving 1.5x higher throughput and 1.7x faster time-to-first-token (TTFT) than Text Generation Inference (TGI) for Llama 3.1 405B....
Serving LLMs on AMD MI300X: Best Practices TL;DR: vLLM unlocks incredible performance on the AMD MI300X, achieving 1.5x …
vLLM BlogInfra#llama#inference
600d
61
IBM Granite 3.0 models October 21, 2024 Ollama partners with IBM to bring Granite 3.0 models to Ollama.
IBM Granite 3.0 models October 21, 2024 A selection of IBM Granite 3.0 models are now available to run using Ollama. All…
Ollama BlogModel#llama
602d
62
“Llama 3.2 in Keras”
Llama 3.2 in Keras Question: Llama 3.2 landed two weeks ago on Hugging Face / Transformers. When will it be available in…
Hugging Face BlogModel#llama
602d
63
10/14/2024 Three projects, one platform: A developer's winning streak with Fireworks AI
When it comes to building with Fireworks AI, few developers can match Nehil Jain's track record. His latest triumph – se…
Fireworks AI BlogHardware#llama#fine-tuning#inference
609d
64
Llama 3.2 goes small and multimodal September 25, 2024 Ollama partners with Meta to bring Llama 3.2 to Ollama.
Llama 3.2 goes small and multimodal September 25, 2024 Meta’s Llama 3.2 is now available to run using Ollama. To get sta…
Ollama BlogInfra#llama#multimodal
628d
65
Llama can now see and run on your device - welcome Llama 3.2
Llama can now see and run on your device - welcome Llama 3.2 Llama 3.2 Vision comes in two sizes: 11B for efficient depl…
Hugging Face BlogModel#llama
628d
66
9/25/2024 Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference
We are excited to announce support for the newest additions to the Llama collection from Meta. With the addition of Llam…
Fireworks AI BlogInfra#llama#fine-tuning#inference
628d
67
Reduce hallucinations with Bespoke-Minicheck September 18, 2024 Bespoke-Minicheck is a new grounded factuality checking model developed by Bespoke Labs that is now available in Ollama. It can fact-check responses generated by other models to detect and reduce hallucinations.
Reduce hallucinations with Bespoke-Minicheck September 18, 2024 Bespoke-Minicheck is a new grounded factuality checking …
Ollama BlogModel#llama
635d
68
Accelerate 1.0.0
Accelerate 1.0.0 What is Accelerate today? 3.5 years ago, Accelerate was a simple framework aimed at making training on …
Hugging Face BlogHardware#llama#inference#training
640d
69
vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction Sep 5, 2024 · 12 min read TL;DR: vLLM achieves 2.7x higher throughput and 5x faster TPOT (time per output token) on Llama 8B model, and 1.8x higher throughput and 2x less TPOT on Llama 70B model.
vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction TL;DR: vLLM achieves 2.7x higher throughput and 5x fas…
vLLM BlogHardware#llama#inference
648d
70
Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI
Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI In this blog you will learn how to programmatically deploy meta-lla…
Hugging Face BlogModel#llama
665d
71
8/14/2024 Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
Large Language Models have revolutionized how we retrieve information or build search systems. Retrieval-augmented gener…
Fireworks AI BlogInfra#llama#rag#fine-tuning
670d
72
Serverless Inference with Hugging Face and NVIDIA NIM
Serverless Inference with Hugging Face and NVIDIA NIM Inference ProvidersUpdate: This service is deprecated and no longe…
Hugging Face BlogInfra#llama#mistral#inference
686d
73
Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world.
Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model …
Ollama BlogAgents#llama
690d
74
Announcing Llama 3.1 Support in vLLM Jul 23, 2024 · 6 min read Today, the vLLM team is excited to partner with Meta to announce the support for the Llama 3.1 model series. Llama 3.1 comes with exciting new features with longer context length (up to 128K...
Announcing Llama 3.1 Support in vLLM Today, the vLLM team is excited to partner with Meta to announce the support for th…
vLLM BlogInfra#llama#inference
692d
75
Run Meta Llama 3.1 405B with an API
Run Meta Llama 3.1 405B with an API Llama 3.1 is the latest language model from Meta. It features a massive 405 billion …
Replicate BlogTutorial#llama#open-source
692d
76
Llama 3.1 - 405B, 70B & 8B with multilinguality and long context
Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Llama 3.1 comes in three sizes: 8B for efficient deploy…
Hugging Face BlogModel#llama
692d
77
7/23/2024 Introducing Llama 3.1 inference endpoints in partnership with Meta
We’re thrilled to introduce Llama 3.1 inference endpoints in partnership with Meta. With expanded context length, multil…
Fireworks AI BlogInfra#llama#gpt#fine-tuning
692d
78
Google Gemma 2 June 27, 2024 Gemma 2 is now available on Ollama in 3 sizes - 2B, 9B and 27B.
Google Gemma 2 June 27, 2024 Google Gemma 2 is now available in three sizes, 2B, 9B and 27B, featuring a brand new archi…
Ollama BlogModel#llama
718d
79
Replicate Intelligence #3
Replicate Intelligence #3 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-…
Replicate BlogTutorial#llama#multimodal
738d
80
Replicate Intelligence #1
Replicate Intelligence #1 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-…
Replicate BlogInfra#llama#open-source
752d
81
From cloud to developers: Hugging Face and Microsoft Deepen Collaboration
From cloud to developers: Hugging Face and Microsoft Deepen Collaboration A collaboration for Cloud AI Builders we are e…
Hugging Face BlogInfra#llama#mistral#qwen
755d
82
Google announces Firebase Genkit with Ollama support May 20, 2024 At Google IO 2024, Google announced Ollama support in Firebase Genkit, a new open-source framework for developers to build, deploy and monitor production-ready AI-powered apps.
Google announces Firebase Genkit with Ollama support May 20, 2024 At Google IO 2024, Google unveiled Firebase Genkit, fe…
Ollama BlogOpen Source#llama#coding#open-source
756d
83
StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation
StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation StarCoder2-15B-Instruct achieve…
Hugging Face BlogTutorial#llama#gpt#coding
777d
84
Llama 3 is not very censored April 19, 2024 Compared to Llama 2, Llama 3 feels much less censored. Meta has substantially lowered false refusal rates. Llama 3 will refuse less than 1/3 of the prompts previously refused by Llama 2.
Llama 3 is not very censored April 19, 2024 Llama 3 feels significantly less censored than its predecessor. The Llama 3 …
Ollama BlogModel#llama
787d
85
Run Meta Llama 3 with an API
Run Meta Llama 3 with an API Llama 3 is the latest language model from Meta. It has state of the art performance and a c…
Replicate BlogTutorial#llama
788d
86
Llama 3 April 18, 2024 Llama 3 is now available to run on Ollama. This model is the next generation of Meta's state-of-the-art large language model, and is the most capable openly available LLM to date.
Llama 3 April 18, 2024 Llama 3 is now available to run using Ollama. To get started, Download Ollama and run Llama 3: ol…
Ollama BlogModel#llama
788d
87
Welcome Llama 3 - Meta's new open LLM
Welcome Llama 3 - Meta’s new open LLM Introduction Meta’s Llama 3, the next iteration of the open-access Llama family, i…
Hugging Face BlogModel#llama
788d
88
4/18/2024 Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
We are pleased to announce the availability of the open-source Llama 3 8B and 70B models with 8k context, served from ou…
Fireworks AI BlogInfra#llama#fine-tuning#inference
788d
89
Embedding models April 8, 2024 Embedding models are available in Ollama, making it easy to generate vector embeddings for use in search and retrieval augmented generation (RAG) applications.
Embedding models April 8, 2024 Ollama supports embedding models, making it possible to build retrieval augmented generat…
Ollama BlogModel#llama#rag#embeddings
798d
90
Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon
Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon SetFit achieves high accuracy with little labeled data - for…
Hugging Face BlogModel#llama#gpt#inference
803d
91
Ollama now supports AMD graphics cards March 14, 2024 Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features of Ollama can now be accelerated by AMD graphics cards on Ollama for Linux and Windows.
Ollama now supports AMD graphics cards March 14, 2024 Ollama now supports AMD graphics cards in preview on Windows and L…
Ollama BlogModel#llama
823d
92
Windows preview February 15, 2024 Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in a new native Windows experience. Ollama on Windows includes built-in GPU acceleration, access to the full model library, and serves the Ollama API including OpenAI compatibility.
Windows preview February 15, 2024 Ollama is now available on Windows in preview, making it possible to pull, run and cre…
Ollama BlogHardware#llama
851d
93
OpenAI compatibility February 8, 2024 Ollama now has initial compatibility with the OpenAI Chat Completions API, making it possible to use existing tooling built for OpenAI with local models via Ollama.
OpenAI compatibility February 8, 2024 Ollama now has built-in compatibility with the OpenAI Chat Completions API, making…
Ollama BlogModel#llama#local
858d
94
From OpenAI to Open LLMs with Messages API on Hugging Face
From OpenAI to Open LLMs with Messages API on Hugging Face Starting with version 1.4.0, TGI offers an API compatible wit…
Hugging Face BlogInfra#llama#gpt#agents
858d
95
Vision models February 2, 2024 New vision models are now available: LLaVA 1.6, in 7B, 13B and 34B parameter sizes. These models support higher resolution images, improved text recognition and logical reasoning.
Vision models February 2, 2024 New LLaVA models The LLaVA (Large Language-and-Vision Assistant) model collection has bee…
Ollama BlogInfra#llama#multimodal
864d
96
Hugging Face Text Generation Inference available for AWS Inferentia2
Hugging Face Text Generation Inference available for AWS Inferentia2 Text Generation Inference (TGI), is a purpose-built…
Hugging Face BlogHardware#llama#mistral#inference
865d
97
Run Code Llama 70B with an API
Run Code Llama 70B with an API Code Llama is a code generation model built on top of Llama 2. It can generate code and n…
Replicate BlogTutorial#llama#coding#open-source
867d
98
Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding
Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding Introduction Recently, code generatio…
Hugging Face BlogOpen Source#llama#coding#open-source
867d
99
Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are now available, making it easy to integrate your Python or JavaScript, or Typescript app with Ollama in a few lines of code. Both libraries include all the features of the Ollama REST API, are familiar in design, and compatible with new and previous versions of Ollama.
Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are no…
Ollama BlogAPI#llama
874d
100
1/18/2024 FireLLaVA: the first commercially permissive OSS LLaVA model
We have come to rely heavily on text as input for foundation models to generate responses. However, in real-world applic…
Fireworks AI BlogResearch#llama#open-source
879d