$ timeahead.in

$ articles --tag llama

#llama

100 articles

01

Book publishers sue Meta over AI’s ‘word-for-word’ copying

Meta is facing a class action lawsuit filed by five major book publishers and one author over claims the company “engage…

The Verge AIModel#llama#training

88d

02

Extract PDF text in your browser with LiteParse for the web

Extract PDF text in your browser with LiteParse for the web 23rd April 2026 LlamaIndex have a most excellent open source…

Simon Willison BlogFrameworks#llama#llamaindex#open-source

100d

03

Ollama is now powered by MLX on Apple Silicon in preview March 30, 2026 Today, we're previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple's machine learning framework.

Ollama is now powered by MLX on Apple Silicon in preview March 30, 2026 Today, we’re previewing the fastest way to run O…

Ollama BlogHardware#llama

124d

04

The simplest and fastest way to setup OpenClaw February 23, 2026 Setup OpenClaw in under two minutes with a single Ollama command.

The simplest and fastest way to setup OpenClaw February 23, 2026 OpenClaw is a personal AI assistant that can clear your…

Ollama BlogModel#llama

159d

05

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

GGML and llama.cpp join HF to ensure the long-term progress of Local AI Georgi Gerganov and team are joining HF with the…

Hugging Face BlogModel#llama#local

162d

06

Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code.

Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code. N…

Ollama BlogModel#llama#claude#coding

166d

07

Accelerating Long-Context Model Training in JAX and XLA

Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128…

NVIDIA Developer BlogModel#llama#training#gpu

179d

08

OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that connects your messaging apps to local AI coding agents, all running on your own device.

OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that bridges your favorite messaging platforms to AI codin…

Ollama BlogModel#llama#coding#local

181d

09

ollama launch January 23, 2026 ollama launch is a new command which sets up and runs coding tools like Claude Code, OpenCode, and Codex with local or cloud models. No environment variables or config files needed.

ollama launch January 23, 2026 ollama launch is a new command which sets up and runs your favorite coding tools like Cla…

Ollama BlogModel#llama#claude#coding

190d

10

Image generation (experimental) January 20, 2026 Generate images locally with Ollama on macOS. Windows and Linux support coming soon.

Image generation (experimental) January 20, 2026 Ollama now supports image generation on macOS, with Windows and Linux c…

Ollama BlogResearch#llama#multimodal

193d

11

Claude Code with Anthropic API compatibility January 16, 2026 Ollama is now compatible with the Anthropic Messages API, making it possible to use tools like Claude Code with open models.

Claude Code with Anthropic API compatibility January 16, 2026 Ollama v0.14.0 and later are now compatible with the Anthr…

Ollama BlogModel#llama#claude#coding

197d

12

OpenAI Codex with Ollama January 15, 2026 Open models can be used with OpenAI's Codex CLI through Ollama. Codex can read, modify, and execute code in your working directory using models such as gpt-oss:20b, gpt-oss:120b, or other open-weight alternatives.

OpenAI Codex with Ollama January 15, 2026 Open models can be used with OpenAI’s Codex CLI through Ollama. Codex can read…

Ollama BlogOpen Source#llama#coding

198d

13

Import AI 439: AI kernels; decentralized training; and universal representations

Import AI 439: AI kernels; decentralized training; and universal representations How might a hypothetical superintellige…

Import AI (Jack Clark)Research#llama#claude#inference

208d

14

New in llama.cpp: Model Management

New in llama.cpp: Model Management Reminder: llama.cpp server is a lightweight, OpenAI-compatible HTTP server for runnin…

Hugging Face BlogModel#llama

233d

15

OVHcloud on Hugging Face Inference Providers 🔥

OVHcloud on Hugging Face Inference Providers 🔥 We're thrilled to share that OVHcloud is now a supported Inference Provi…

Hugging Face BlogResearch#llama#qwen#fine-tuning

250d

16

OpenAI gpt-oss-safeguard October 29, 2025 Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) to bring the latest gpt-oss-safeguard reasoning models to users for safety classification tasks. gpt-oss-safeguard models are available in two sizes: 20B and 120B, and are permissively licensed under the Apache 2.0 license.

OpenAI gpt-oss-safeguard October 29, 2025 Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) t…

Ollama BlogOpen Source#llama#safety

276d

17

Building Instant RL Loops with Meta Llama Tools and Cerebras October 27, 2025

Oct 27 2025 Building Instant RL Loops with Meta Llama Tools and Cerebras In this post, we’ll show how to use two open-so…

Cerebras BlogTutorial#llama#inference#training

277d

18

MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama's cloud. It's a model built for coding and agentic workflows.

MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama’s cloud. It’s a model built for coding and agentic wor…

Ollama BlogAgents#llama#agents#coding

277d

19

Granite 4.0 Nano: Just how small can you go?

Granite 4.0 Nano: Just how small can you go? Today we are excited to share Granite 4.0 Nano, our smallest models yet, re…

Hugging Face BlogInfra#llama#inference#local

277d

20

NVIDIA DGX Spark performance October 23, 2025 We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs.

NVIDIA DGX Spark performance October 23, 2025 Performance We ran performance tests on release day firmware and an update…

Ollama BlogModel#llama#gpu

282d

21

New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service with easy integrations to the tools you are familiar with. Qwen3-Coder-30B has been updated for faster, more reliable tool calling in Ollama’s new engine.

New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service w…

Ollama BlogAgents#llama#qwen#coding

289d

22

10/15/2025 LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama

Modern AI apps rarely run on a single model forever. Teams iterate, swap providers, and increasingly run open-source mod…

Fireworks AI BlogInfra#llama#fine-tuning#inference

290d

23

Qwen3-VL October 14, 2025 Ollama now supports Alibaba's Qwen3-VL.

Qwen3-VL October 14, 2025 Qwen3-VL, the most powerful vision language model in the Qwen series is now available on Ollam…

Ollama BlogModel#llama#qwen

291d

24

NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box.

NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it run…

Ollama BlogModel#llama#gpu

292d

25

Web search September 24, 2025 A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud.

Web search September 24, 2025 A new web search API is now available in Ollama. Ollama provides a generous free tier of w…

Ollama BlogModel#llama

311d

26

New model scheduling September 23, 2025 Ollama now includes a significantly improved model scheduling system, reducing crashes due to out of memory issues, maximizing GPU utilization and performance, especially on multi-GPU systems.

New model scheduling September 23, 2025 Ollama now includes a significantly improved model scheduling system. Ahead of r…

Ollama BlogHardware#llama

312d

27

Arm & ExecuTorch 0.7: Bringing Generative AI to the masses

Arm & ExecuTorch 0.7: Bringing Generative AI to the masses With Arm’s recent SME2 announcement, the role of Arm KleidiAI…

Hugging Face BlogHardware#llama#coding#embeddings

353d

28

OpenAI gpt-oss August 5, 2025 Ollama partners with OpenAI to bring gpt-oss to Ollama and its community.

OpenAI gpt-oss August 5, 2025 Welcome OpenAI’s gpt-oss! Ollama partners with OpenAI to bring its latest state-of-the-art…

Ollama BlogOpen Source#llama

361d

29

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench NVIDIA’s AI-Q Blueprint—the leading portable, open dee…

Hugging Face BlogResearch#llama#open-source

362d

30

Ollama's new app July 30, 2025 Ollama's new app is now available for macOS and Windows.

Ollama's new app July 30, 2025 Ollama’s new app is now available for macOS and Windows. An easier way to chat with model…

Ollama BlogModel#llama

367d

31

Ettin Suite: SoTA Paired Encoders and Decoders

Ettin Suite: SoTA Paired Encoders and Decoders TL;DR What would happen if you took the ModernBERT recipe and applied it …

Hugging Face BlogModel#llama#training

381d

32

SmolLM3: smol, multilingual, long-context reasoner

SmolLM3: smol, multilingual, long-context reasoner - Base model: https://hf.co/HuggingFaceTB/SmolLM3-3B-Base - Instruct …

Hugging Face BlogModel#llama#qwen#training

389d

33

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub TL;DR NVIDIA Llama Nemotron Nano VL is a state-of-the-art…

Hugging Face BlogModel#llama#gpu

400d

34

Transformers backend integration in SGLang

Transformers backend integration in SGLang But once you're ready to move from notebooks to production, inference perform…

Hugging Face BlogInfra#llama#inference

404d

35

Secure Minions: private collaboration between Ollama and frontier models June 3, 2025 Secure Minions is a secure protocol built by Stanford's Hazy Research lab to allow encrypted local-remote communication.

Secure Minions: private collaboration between Ollama and frontier models June 3, 2025 Three months ago, Stanford’s Hazy …

Ollama BlogResearch#llama

424d

36

Thinking May 30, 2025 Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.

Thinking May 30, 2025 Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choo…

Ollama BlogModel#llama

429d

37

Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time.

Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enabl…

Ollama BlogAgents#llama

431d

38

Dell Enterprise Hub is all you need to build AI on premises

Dell Enterprise Hub is all you need to build AI on premises Models Ready for Action If you go to the Dell Enterprise Hub…

Hugging Face BlogInfra#llama#training#gpu

435d

39

Ollama's new engine for multimodal models May 15, 2025 Ollama now supports new multimodal models with its new engine.

Ollama's new engine for multimodal models May 15, 2025 Ollama now supports multimodal models via Ollama’s new engine, st…

Ollama BlogInfra#llama#multimodal

443d

40

The Transformers Library: standardizing model definitions

The Transformers Library: standardizing model definitions Transformers was created in 2019, shortly following the releas…

Hugging Face BlogInfra#llama#qwen#rag

443d

41

Welcoming Llama Guard 4 on Hugging Face Hub

Welcoming Llama Guard 4 on Hugging Face Hub Table-of-Contents What is Llama Guard 4? Vision and large language models de…

Hugging Face BlogModel#llama

459d

42

Official Llama API Now Fastest via Groq Inference

Official Llama API Now Fastest via Groq Inference The official Llama API is now accelerated by Groq. Served on the world…

Groq BlogInfra#llama#inference

459d

43

4/28/2025 Optimizing Llama 4 Maverick on Fireworks AI

Meta's Llama 4 Maverick is their initial natively-multimodal, Mixture-of-Experts (MoE) model. This model processes both …

Fireworks AI BlogResearch#llama#fine-tuning#inference

460d

44

Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC

Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC As a preview of what you ca…

Hugging Face BlogTutorial#llama#multimodal#coding

479d

45

Llama 4 in vLLM Apr 5, 2025 · 4 min read We're excited to announce that vLLM now supports the Llama 4 herd of models: Scout (17B-16E) and Maverick (17B-128E). You can run these powerful long-context, natively multi-modal (up to 8-10...

Llama 4 in vLLM We're excited to announce that vLLM now supports the Llama 4 herd of models: Scout (17B-16E) and Maveric…

vLLM BlogInfra#llama#inference

483d

46

Welcome Llama 4 Maverick & Scout on Hugging Face

Welcome Llama 4 Maverick & Scout on Hugging Face Released today, these powerful, natively multimodal models represent a …

Hugging Face BlogModel#llama

483d

47

Llama 4 Inference Fast & Affordable – Now Live on GroqCloud

Llama 4 Inference Fast & Affordable – Now Live on GroqCloud Meta’s Llama 4 Scout and Maverick models are live today on G…

Groq BlogInfra#llama#inference

483d

48

Journey to 1 Million Gradio Users!

Journey to 1 Million Gradio Users! 5 years ago, we launched Gradio as a simple Python library to let researchers at Stan…

Hugging Face BlogResearch#llama#multimodal#coding

484d

49

Minions: where local and cloud LLMs meet February 25, 2025 Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christopher Ré's Stanford Hazy Research lab, along with Avner May, Scott Linderman, James Zou, have developed a way to shift a substantial portion of LLM workloads to consumer devices by having small on-device models (such as Llama 3.2 with Ollama) collaborate with larger models in the cloud (such as GPT-4o).

Minions: where local and cloud LLMs meet February 25, 2025 Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christ…

Ollama BlogResearch#llama#gpt#local

522d

50

Introducing vLLM Inference Provider in Llama Stack Jan 27, 2025 · 8 min read We are excited to announce that vLLM inference provider is now available in Llama Stack through the collaboration between the Red Hat AI Engineering team and the Llama Stack team from Meta. This...

Introducing vLLM Inference Provider in Llama Stack We are excited to announce that vLLM inference provider is now availa…

vLLM BlogInfra#llama#inference

551d

51

Mastering Long Contexts in LLMs with KVPress

Mastering Long Contexts in LLMs with KVPress KVPress packs the latest KV cache compression techniques, enabling memory-e…

Hugging Face BlogModel#llama

555d

52

Visual Document Retrieval Goes Multilingual

Visual Document Retrieval Goes Multilingual TL;DR: We presentvdr-2b-multi-v1 , the best multilingual embedding model for…

Hugging Face BlogFrameworks#llama#qwen#llamaindex

568d

53

Bamba: Inference-Efficient Hybrid Mamba2 Model

Bamba: Inference-Efficient Hybrid Mamba2 Model 🐍 TL;DR We introduce Bamba-9B, an inference-efficient Hybrid Mamba2 mode…

Hugging Face BlogResearch#llama#inference#observability

591d

54

Structured outputs December 6, 2024 Ollama now supports structured outputs making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs.

Structured outputs December 6, 2024 Ollama now supports structured outputs making it possible to constrain a model’s out…

Ollama BlogHardware#llama

603d

55

Rearchitecting Hugging Face Uploads and Downloads

Rearchitecting Hugging Face Uploads and Downloads - Uploads from 88 countries - 8.2 million upload requests - 130.8 TB o…

Hugging Face BlogInfra#llama#rag

613d

56

Ollama Python library 0.4 with function calling improvements November 25, 2024 With Ollama Python library version 0.4, functions can now be provided as tools. The library now also has full typing support and new examples have been added.

Ollama Python library 0.4 with function calling improvements November 25, 2024 In the latest version of the Ollama Pytho…

Ollama BlogAgents#llama

614d

57

You could have designed state of the art positional encoding

You could have designed state of the art positional encoding Gall's Law A complex system that works is invariably found …

Hugging Face BlogModel#llama#coding

614d

58

Llama 3.2 Vision November 6, 2024 Llama 3.2 Vision 11B and 90B models are now available in Ollama.

Llama 3.2 Vision November 6, 2024 Llama 3.2 Vision is now available to run in Ollama, in both 11B and 90B sizes. Get sta…

Ollama BlogModel#llama#multimodal

633d

59

Universal Assisted Generation: Faster Decoding with Any Assistant Model

Universal Assisted Generation: Faster Decoding with Any Assistant Model gemma-2-9b and Mixtral-8x22B-Instruct-v0.1 lack …

Hugging Face BlogOpen Source#llama#inference#coding

641d

60

Serving LLMs on AMD MI300X: Best Practices Oct 23, 2024 · 15 min read TL;DR: vLLM unlocks incredible performance on the AMD MI300X, achieving 1.5x higher throughput and 1.7x faster time-to-first-token (TTFT) than Text Generation Inference (TGI) for Llama 3.1 405B....

Serving LLMs on AMD MI300X: Best Practices TL;DR: vLLM unlocks incredible performance on the AMD MI300X, achieving 1.5x …

vLLM BlogInfra#llama#inference

647d

61

IBM Granite 3.0 models October 21, 2024 Ollama partners with IBM to bring Granite 3.0 models to Ollama.

IBM Granite 3.0 models October 21, 2024 A selection of IBM Granite 3.0 models are now available to run using Ollama. All…

Ollama BlogModel#llama

649d

62

“Llama 3.2 in Keras”

Llama 3.2 in Keras Question: Llama 3.2 landed two weeks ago on Hugging Face / Transformers. When will it be available in…

Hugging Face BlogModel#llama

649d

63

10/14/2024 Three projects, one platform: A developer's winning streak with Fireworks AI

When it comes to building with Fireworks AI, few developers can match Nehil Jain's track record. His latest triumph – se…

Fireworks AI BlogHardware#llama#fine-tuning#inference

656d

64

Llama 3.2 goes small and multimodal September 25, 2024 Ollama partners with Meta to bring Llama 3.2 to Ollama.

Llama 3.2 goes small and multimodal September 25, 2024 Meta’s Llama 3.2 is now available to run using Ollama. To get sta…

Ollama BlogInfra#llama#multimodal

675d

65

Llama can now see and run on your device - welcome Llama 3.2

Llama can now see and run on your device - welcome Llama 3.2 Llama 3.2 Vision comes in two sizes: 11B for efficient depl…

Hugging Face BlogModel#llama

675d

66

9/25/2024 Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

We are excited to announce support for the newest additions to the Llama collection from Meta. With the addition of Llam…

Fireworks AI BlogInfra#llama#fine-tuning#inference

675d

67

Reduce hallucinations with Bespoke-Minicheck September 18, 2024 Bespoke-Minicheck is a new grounded factuality checking model developed by Bespoke Labs that is now available in Ollama. It can fact-check responses generated by other models to detect and reduce hallucinations.

Reduce hallucinations with Bespoke-Minicheck September 18, 2024 Bespoke-Minicheck is a new grounded factuality checking …

Ollama BlogModel#llama

682d

68

Accelerate 1.0.0

Accelerate 1.0.0 What is Accelerate today? 3.5 years ago, Accelerate was a simple framework aimed at making training on …

Hugging Face BlogHardware#llama#inference#training

687d

69

vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction Sep 5, 2024 · 12 min read TL;DR: vLLM achieves 2.7x higher throughput and 5x faster TPOT (time per output token) on Llama 8B model, and 1.8x higher throughput and 2x less TPOT on Llama 70B model.

vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction TL;DR: vLLM achieves 2.7x higher throughput and 5x fas…

vLLM BlogHardware#llama#inference

695d

70

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI In this blog you will learn how to programmatically deploy meta-lla…

Hugging Face BlogModel#llama

712d

71

8/14/2024 Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

Large Language Models have revolutionized how we retrieve information or build search systems. Retrieval-augmented gener…

Fireworks AI BlogInfra#llama#rag#fine-tuning

717d

72

Serverless Inference with Hugging Face and NVIDIA NIM

Serverless Inference with Hugging Face and NVIDIA NIM Inference ProvidersUpdate: This service is deprecated and no longe…

Hugging Face BlogInfra#llama#mistral#inference

733d

73

Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world.

Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model …

Ollama BlogAgents#llama

737d

74

Announcing Llama 3.1 Support in vLLM Jul 23, 2024 · 6 min read Today, the vLLM team is excited to partner with Meta to announce the support for the Llama 3.1 model series. Llama 3.1 comes with exciting new features with longer context length (up to 128K...

Announcing Llama 3.1 Support in vLLM Today, the vLLM team is excited to partner with Meta to announce the support for th…

vLLM BlogInfra#llama#inference

739d

75

Run Meta Llama 3.1 405B with an API

Run Meta Llama 3.1 405B with an API Llama 3.1 is the latest language model from Meta. It features a massive 405 billion …

Replicate BlogTutorial#llama#open-source

739d

76

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context Llama 3.1 comes in three sizes: 8B for efficient deploy…

Hugging Face BlogModel#llama

739d

77

7/23/2024 Introducing Llama 3.1 inference endpoints in partnership with Meta

We’re thrilled to introduce Llama 3.1 inference endpoints in partnership with Meta. With expanded context length, multil…

Fireworks AI BlogInfra#llama#gpt#fine-tuning

739d

78

Google Gemma 2 June 27, 2024 Gemma 2 is now available on Ollama in 3 sizes - 2B, 9B and 27B.

Google Gemma 2 June 27, 2024 Google Gemma 2 is now available in three sizes, 2B, 9B and 27B, featuring a brand new archi…

Ollama BlogModel#llama

765d

79

Replicate Intelligence #3

Replicate Intelligence #3 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-…

Replicate BlogTutorial#llama#multimodal

785d

80

Replicate Intelligence #1

Replicate Intelligence #1 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-…

Replicate BlogInfra#llama#open-source

799d

81

From cloud to developers: Hugging Face and Microsoft Deepen Collaboration

From cloud to developers: Hugging Face and Microsoft Deepen Collaboration A collaboration for Cloud AI Builders we are e…

Hugging Face BlogInfra#llama#mistral#qwen

802d

82

Google announces Firebase Genkit with Ollama support May 20, 2024 At Google IO 2024, Google announced Ollama support in Firebase Genkit, a new open-source framework for developers to build, deploy and monitor production-ready AI-powered apps.

Google announces Firebase Genkit with Ollama support May 20, 2024 At Google IO 2024, Google unveiled Firebase Genkit, fe…

Ollama BlogOpen Source#llama#coding#open-source

803d

83

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation StarCoder2-15B-Instruct achieve…

Hugging Face BlogTutorial#llama#gpt#coding

824d

84

Llama 3 is not very censored April 19, 2024 Compared to Llama 2, Llama 3 feels much less censored. Meta has substantially lowered false refusal rates. Llama 3 will refuse less than 1/3 of the prompts previously refused by Llama 2.

Llama 3 is not very censored April 19, 2024 Llama 3 feels significantly less censored than its predecessor. The Llama 3 …

Ollama BlogModel#llama

834d

85

Run Meta Llama 3 with an API

Run Meta Llama 3 with an API Llama 3 is the latest language model from Meta. It has state of the art performance and a c…

Replicate BlogTutorial#llama

835d

86

Llama 3 April 18, 2024 Llama 3 is now available to run on Ollama. This model is the next generation of Meta's state-of-the-art large language model, and is the most capable openly available LLM to date.

Llama 3 April 18, 2024 Llama 3 is now available to run using Ollama. To get started, Download Ollama and run Llama 3: ol…

Ollama BlogModel#llama

835d

87

Welcome Llama 3 - Meta's new open LLM

Welcome Llama 3 - Meta’s new open LLM Introduction Meta’s Llama 3, the next iteration of the open-access Llama family, i…

Hugging Face BlogModel#llama

835d

88

4/18/2024 Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

We are pleased to announce the availability of the open-source Llama 3 8B and 70B models with 8k context, served from ou…

Fireworks AI BlogInfra#llama#fine-tuning#inference

835d

89

Embedding models April 8, 2024 Embedding models are available in Ollama, making it easy to generate vector embeddings for use in search and retrieval augmented generation (RAG) applications.

Embedding models April 8, 2024 Ollama supports embedding models, making it possible to build retrieval augmented generat…

Ollama BlogModel#llama#rag#embeddings

845d

90

Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon

Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon SetFit achieves high accuracy with little labeled data - for…

Hugging Face BlogModel#llama#gpt#inference

850d

91

Ollama now supports AMD graphics cards March 14, 2024 Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features of Ollama can now be accelerated by AMD graphics cards on Ollama for Linux and Windows.

Ollama now supports AMD graphics cards March 14, 2024 Ollama now supports AMD graphics cards in preview on Windows and L…

Ollama BlogModel#llama

870d

92

Windows preview February 15, 2024 Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in a new native Windows experience. Ollama on Windows includes built-in GPU acceleration, access to the full model library, and serves the Ollama API including OpenAI compatibility.

Windows preview February 15, 2024 Ollama is now available on Windows in preview, making it possible to pull, run and cre…

Ollama BlogHardware#llama

898d

93

OpenAI compatibility February 8, 2024 Ollama now has initial compatibility with the OpenAI Chat Completions API, making it possible to use existing tooling built for OpenAI with local models via Ollama.

OpenAI compatibility February 8, 2024 Ollama now has built-in compatibility with the OpenAI Chat Completions API, making…

Ollama BlogModel#llama#local

905d

94

From OpenAI to Open LLMs with Messages API on Hugging Face

From OpenAI to Open LLMs with Messages API on Hugging Face Starting with version 1.4.0, TGI offers an API compatible wit…

Hugging Face BlogInfra#llama#gpt#agents

905d

95

Vision models February 2, 2024 New vision models are now available: LLaVA 1.6, in 7B, 13B and 34B parameter sizes. These models support higher resolution images, improved text recognition and logical reasoning.

Vision models February 2, 2024 New LLaVA models The LLaVA (Large Language-and-Vision Assistant) model collection has bee…

Ollama BlogInfra#llama#multimodal

911d

96

Hugging Face Text Generation Inference available for AWS Inferentia2

Hugging Face Text Generation Inference available for AWS Inferentia2 Text Generation Inference (TGI), is a purpose-built…

Hugging Face BlogHardware#llama#mistral#inference

912d

97

Run Code Llama 70B with an API

Run Code Llama 70B with an API Code Llama is a code generation model built on top of Llama 2. It can generate code and n…

Replicate BlogTutorial#llama#coding#open-source

914d

98

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding Introduction Recently, code generatio…

Hugging Face BlogOpen Source#llama#coding#open-source

914d

99

Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are now available, making it easy to integrate your Python or JavaScript, or Typescript app with Ollama in a few lines of code. Both libraries include all the features of the Ollama REST API, are familiar in design, and compatible with new and previous versions of Ollama.

Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are no…

Ollama BlogAPI#llama

921d

100

1/18/2024 FireLLaVA: the first commercially permissive OSS LLaVA model

We have come to rely heavily on text as input for foundation models to generate responses. However, in real-world applic…

Fireworks AI BlogResearch#llama#open-source

926d