Higress Updates: AI Capabilities Open-Sourced and Cloud-Native Capabilities Upgraded

1. Introduction to the New Version

The latest version 1.4 of Higress is based on the accumulated experience of providing AI gateways for Tongyi Qianqian and many cloud AGI vendors, and has open-sourced a large number of AI-native gateway capabilities. At the same time, it has also upgraded all aspects of cloud-native capabilities such as Ingress, observable, flow control, etc. 1:

1. Comprehensive open source of AI capabilities: provide multiple out-of-the-box plug-ins including security protection, multi-model adaptation, observable, caching, cue word engineering and other areas, core capabilities such as:

a. AI agent plug-in: support for docking multi-vendor protocols, supporting a total of 15 LLM providers, basically covering domestic and international mainstream large model vendors.

b. AI Content Audit Plug-in: supports docking with AliCloud Content Security Cloud Service, which can intercept harmful language, misleading information, discriminatory remarks, illegal and unlawful content, etc. c. AI Statistics Plug-in: supports docking with AliCloud Content Security Cloud Service.

c. AI Statistics Plug-in: supports statistics on Token throughput, real-time generation of promethus metrics, and printing of related information in the access log.

d. AI flow limiting plug-in: supports back-end protected flow limiting based on Token throughput, and also supports precise calling quota limitation configured for calling tenants.

e. AI development plug-in set: Provides LLM result caching, cue word decoration and other related capabilities, which can help the development and construction of AI applications.

2. Upgrade cloud-native capabilities:

a. Optimize ultra-large-scale routing configuration: under the scale of 10,000 routes, the effective time of adding a new route is optimized from 10 seconds in version 1.3 to 3 seconds, which has a significant advantage over other gateways such as ingress-nginx controller.

b. Simplify HTTPS certificate management: support a global configuration to unify the management of domain name certificates, solving the pain point of certificate management that ingress-nginx needs to make a secret copy of the certificate, and support docking Let's Encrypt to do the automatic issuance and renewal of free certificates without relying on the cert-manager. c. Provide cluster flow control plug-ins: support the use of the Let's Encrypt plug-in to automatically issue and renew free certificates.

c. Provide cluster flow control plugin: support docking Redis to do cluster flow control, which can realize the global unified flow limitation at the granularity of Header/URL parameter/IP.

d. Provide log observation: Higress UI console provides out-of-the-box gateway log query capability.

e. Support for minimal deployment: does not rely on K8s, a Docker container can start, easy for individual developers to use in the local environment.

2. Comprehensive open source AI capabilities

Since 2020, the AliCloud microservices team has precipitated the cloud product Higress, a cloud-native gateway, by serving the needs of Ali's internal and on-cloud customers, and formally open-sourced it at the 2022 Cloud Amphibious Conference. While sharing code and knowledge in the open source community, Higress has been further improved through feedback from a large number of open source users.

Today, Higress is not only the API gateway for core AI services of AliCloud, such as Tongyiqianqian, but also the API gateway for many AGI vendors on the cloud, and we are happy to share the experience accumulated in the process of accessing these scenarios, and open source the related AI capabilities comprehensively:

All of the plug-ins in the image above have been open-sourced and can be used directly out of the box in the latest version of the Higress console:

AI-based plug-in development tools will also be released in the subsequent Higress open source website, so stay tuned.

2.1 Natural Advantages of Taking AI Traffic

Traffic passing through the gateway in AI scenarios has three main characteristics that distinguish it from other business traffic:

- Long Connection: As determined by the Websocket and SSE protocols that are common in AI scenarios, the proportion of long connections is very high, and the gateway is required to update the configuration operation to have no impact on the long connection and not affect the business.

- High latency: The response latency of LLM reasoning is much higher than that of common applications, making AI applications vulnerable to malicious attacks and prone to concurrent attacks by constructing slow requests, with low cost to the attacker but high overhead on the server side.

- Large bandwidth: Combined with the LLM context transmission back and forth, as well as the characteristics of high latency, AI scenarios consume much more bandwidth than ordinary applications, and the gateway is prone to rapid rise in memory if it does not implement a better streaming processing capability and memory recycling mechanism.

Higress has a natural advantage in dealing with such traffic characteristics:

- Lossless hot updates over long connections: Unlike Nginx, where configuration changes require Reload, resulting in disconnections, Higress is based on Envoy, which realizes true hot updates without loss of connectivity.

- Secure Gateway Capability: Higress-based secure gateway capability can provide multi-dimensional CC protection capability such as IP/Cookie, and for AI scenarios, in addition to QPS, it also supports Token throughput-oriented flow limiting protection.

- Efficient Streaming: Higress supports full streaming forwarding, and the data plane is based on Envoy written in C++, which requires very low memory footprint in high bandwidth scenarios. Memory is cheap compared to GPUs, but improper memory control leads to OOM, causing business downtime and immeasurable losses.

The graph below from Sealos, an open source user of Higress, compares resource usage after migrating the gateway from Ingress-nginx to Higress, with memory usage dropping to a tenth of a percent:

2.2 AI Agent Plugin

Higress supports docking APIs from multiple big model vendors and supports calls based on a unified protocol (based on OpenAI's API protocol), which can be used to shield implementation details, thus providing convenience for developers.

Currently supported big model APIs are: Tongyi Thousand Questions, OpenAI/Azure OpenAI, Dark Side of the Moon, Baichuan Intelligence, Zero One Everything, Smart Spectrum AI, Step Leap Star, Wenxin Yiyin, Tencent Mixed Meta, DeepSeek, Anthropic Claude, Groq, MiniMax, Ollama.

Basically, it has covered the mainstream big model APIs on the market, and this part of the work is done jointly by several community developers, whose GitHub IDs are: CH3CHO, hanxiantao, lizzy-0323, goooogoooo, cr7258, xychen5, Claire-w, Chi-Kai, Suchun-svv Suchun-sv

Thanks to these passionate community developers, Higress' AI capabilities can reach more ecosystems. There are still some protocol adaptation tasks for other models that can be claimed, interested developers are welcome to claim them here: https://github.com/alibaba/higress/issues/940?spm=a2c6h.12873639.article-detail.11. 192d29577dCTSB

Using the AI agent plugin, you can also dock the qwen long model of Tongyiqianqian and upload a document to realize the RAG. The following figure shows that we configured the Higress document in the AI agent plugin (you need to upload it through the API provided by Tongyiqianqianqian to get the fileId), and then used an open source front-end tool based on the OpenAI protocol LobeChat to carry out the conversation effect, which can be considered the current OpenAI protocol. LobeChat, an open source front-end tool based on the OpenAI protocol, can be considered the simplest way to build a RAG application:

If you want to build a similar RAG application, you can refer to here: https://github.com/alibaba/higress/issues/1023?spm=a2c6h.12873639.article-detail.12.192d29577dCTSB# issuecomment-2163176897

2.3 AI Content Audit Plugin

Big models are typically trained by learning from data that is widely available on the Internet, and they have the potential to learn and reproduce harmful content or undesirable speech in the process, so when big models generate responses without proper filtering and monitoring, they may produce content that contains harmful language, misleading information, discriminatory speech, or even content that violates laws and regulations. It is because of this potential risk that content security in Big Models is exceptionally important.

In Higress, content security can be interfaced with AliCloud Content Security [1] through simple configuration to protect the compliance of big model Q&A. Content security provides a variety of detection ranges, which can be configured by the user in the console:

Plugin configuration example:

serviceSource: dns

serviceName: safecheck

servicePort: 443

domain: green-cip.cn-shanghai.aliyuncs.com

Request Response Example: After configuration, if the request/response contains illegal content and is intercepted by content security, the gateway will return an answer with content security recommendations:

2.4 AI Statistics Plugin

Compared with traditional microservices, LLM applications mainly measure traffic size by token, for this feature, we have constructed routing-level, service-level, model-level token usage observation capabilities, including logging, monitoring and alerting.

The following figure shows the monitoring of the Tongyiqianqian service deployed on the gateway:

Relevant statistics can also be printed out in the logs:

2.5 AI Flow Limiting Plugin

For a mature API gateway product, it should have two types of flow limiting capabilities, and Higress can also meet these needs:

Flow-limiting scenarios Flow-limiting purposes Examples of statistical dimensions

Back-end protection flow limiting Protect the back-end against abnormal traffic and malicious attacks API: an API limit of 100 QPSIP: each IP access rate can not exceed 10 QPSCookie: each cookie access rate can not exceed 10 QPS

Caller quota flow limitation Need to match the caller authentication mechanism, differentiating quality of service (QoS) for different callers Caller (Consumer): gold member limit 1000 QPS, silver member limit 100 QPS, bronze member limit 10 QPS

In AI scenarios, the need for flow limiting is not only limited to the traditional flow limiting capability of requests per second/per minute/per hour/per day (QPS/QPM/QPH/QPD), but also extended to the management of tokens per minute/per hour/per day (TPM/TPH/TPD). The “T” stands for Token, a unit used to measure the amount of input and output of a large language model. For AI applications, the Token measure is more reflective of resource or cost usage than the traditional request count measure.

The following figure shows OpenAI's limitations for callers at the Tier 2 level for different models, and most AI products have similar limitations:

AI scenarios, the back-end protective flow restriction is also very important, and is often easy to be ignored, especially as many LLM providers have free web applications, and some black and grey may be able to crawl the page calls encapsulated into APIs to provide users with profit-making. In this case, you can use Higress's IP, cookie and other dimensions of the protection of flow restriction to protect.

Higress supports a rich set of traffic-limiting capabilities:

Supported flow limiting dimensions Supported metering methods

API QPS/QPM/QPH/QPDTPS/TPM/TPH/TPD

IP QPS/QPM/QPH/QPDTPS/TPM/TPH/TPD

Cookie QPS/QPM/QPH/QPDTPS/TPM/TPH/TPD

Request Header QPS/QPM/QPH/QPDTPS/TPM/TPH/TPD

URL Parameters QPS/QPM/QPH/QPDTPS/TPM/TPH/TPD

Caller QPS/QPM/QPH/QPDTPS/TPM/TPH/TPD

For example, if you need to limit the number of tokens per minute to 1000 for each IP, 100 for each IP that matches the 1.1.1.0/24 segment, and 10 for the 1.1.1.1 IP, you can configure the following:

rule_name: limit_ip

rule_name: limit_ip: rule_items: limit_by_per_ip.

- rule_name: limit_ip rule_items: limit_by_per_ip: from-remote-addr

rule_name: limit_ip rule_items: limit_by_per_ip: from-remote-addr

- key: 1.1.1.1

token_per_minute: 10

- key: 1.1.1.0/24

token_per_minute: 100

- key: 0.0.0.0/0

token_per_minute: 1000

redis: service_name: redis.static

service_name: redis.static

service_port: 6379

2.6 AI Retrieval Enhancement Generation Plugin

Based on this plugin, the development of LLM-RAG applications can be realized by interfacing with AliCloud Vector Retrieval Service, and the flow is shown in Figure:

For example, based on the CEC-Corpus [2] dataset containing the corpus and annotated data of 332 breaking news reports, extract the original press release text, add it to the AliCloud Vector Retrieval Service after vectorization, and then do the corresponding plug-in configuration in Higress to quickly create a private domain knowledge assistant.

Plugin configuration example:

dashscope.

apiKey: xxxxxxxxxxxxxxxxxxxxxx

serviceName: dashscope

servicePort: 443

domain: dashscope.aliyuncs.com

dashvector.

apiKey: xxxxxxxxxxxxxxxxxxxxxxxx

serviceName: dashvector

servicePort: 443

domain: vrs-cn-xxxxxxxxxxxxxxxxxxxxxxxx.dashvector.cn-hangzhou.aliyuncs.com

collection: news_embedings

Example request response:

2.7 AI Cache Plugin

This plugin implements the function of extracting and caching LLM responses, which can significantly reduce the response latency and save costs for scenarios where the same problem is requested to the LLM API at a high frequency. We previously used Higress + Tongyi Qianqian to translate technical content, and we can see that the translation ability of LLM is very strong, and the English documents on the official website of Higress will also be translated based on LLM. Because the documents need to be continuously updated and iterated, we use Github Action to realize automated CICD, combined with this AI caching plugin + document slice translation (as shown in the figure below), we can realize a low-cost and efficient document automation translation process.

The AI caching plugin will further evolve to support LLM response cache recall based on the similarity of problem vectors, which can significantly reduce the cost of LLM API calls in closed knowledge domain scenarios such as RAG. It is challenging to do trade-off between cost and effect, for which we have organized the Higress AI Gateway Challenge, and welcome your participation.

2.8 AI Cue Template Plugin

Prompt word template plugin is used to quickly build a fixed format Prompt, for specific applications need to limit the question format of the scene will be more helpful, you can configure the Prompt template on the gateway, and based on the ability of the big model to provide the corresponding API.

Example of plugin configuration:

templates.

- name: developer-chat

model: gpt-3.5-turbo

messages.

- role: system

content: 'You are a {{program}} expert in {{language}}'

- role: user

content: 'Write me a {{program}} program'

Example request:

{

“template": ”developer-chat”

“properties": {

“program": ”Rapid Sort Algorithm”

“language": ”python”

}

The real request to LLM after the above request is based on the template transformation is:

{

“model": ‘gpt-3.5-turbo’,

“messages": [

{

“role": ‘system’,

“content": ”You are an expert in fast sorting algorithms and the programming language is python”

{

“role": ‘user’, ‘content’: ”Help me with a quick sort algorithm.

“content": ”Write a quick sort algorithm program for me”

}

]

}

2.9 AI Prompt Modifier Plugin

The Prompt Modifier plugin is also used to adjust the Prompt, supporting the addition of additional Prompts before and after the Prompt entered by the user. Users can use the Higress AI gateway to unify the application logic where the Prompt needs to be manipulated, so that all the LLM API traffic will be processed by Higress, and the Prompt will be manipulated automatically. Unified control of Prompt is accomplished automatically.

Sample plugin configuration:

decorators: name: data-assistant

- name: data-assistant

decorators: name: data-assistant

- role: system

content: If you are asked a question about plugins, you should answer with the name, function, execution phase, and execution priority of all plugins.

append.

- role: user

content: You should answer in the form of a table, with no content other than the table.

Request Response Example:

2.10 AI Request/Response Conversion Plugin

By configuring the AI Request/Response Transformation plugin, users can modify the gateway's request/response directly using natural language methods without writing code. For example, when testing APIs, testers configure the plug-in for the API to be tested, and use the original request/response as an example to generate the request/response for boundary condition testing. The big model will very often be more comprehensive than human consideration, easier to test some boundary cases.

Example of plugin configuration:

response.

enable: true

prompt: “Help me modify the following HTTP response message, request: 1. content-type change to application/json; 2. body from xml to json; 3. remove content-length.”

provider.

serviceName: qwen

domain: dashscope.aliyuncs.com

apiKey: sk-xxxxxxxxxxxxxxxxxxxxxxx

Request Response Example: Create a route to the link through the gateway proxy, the interface will return a response in xml format, which can be processed by the plugin to get a response in json format:

{

“slideshow": {

“title": ‘Sample Slide Show’, ‘date’: ‘date of publication’: {

“author": ‘Yours Truly’,

“slides": [

{

“title": ”Wake up to WonderWidgets!”

{

“type": ‘all’, ‘title’: ‘Overview’, {

“type“: ‘all’, ‘title’: ‘Overview’, { ‘items’: [”Wake up to WonderWidgets!

“items": [

“Why <em>WonderWidgets</em> are great”.

“”, ‘Who buys WonderWidgets’.

“Who <em>buys</em> WonderWidgets”

]

}

]

}

3. Cloud Native Capability Upgrade

3.1 Support for Very Large Scale Routing Configuration

Sealos' Higress practice has attracted a lot of large-scale enterprise users who also have the pain point of large-scale Ingress management to adopt Higress. Based on user feedback, we continue to optimize the speed of route change for this kind of large-scale routing configuration. Compared to version 1.3, we have increased the speed by more than 3 times. With 10,000 Ingress configurations, it takes only 3 seconds for a single Ingress change to take effect.

The following is a set of comparison tests with similar gateways (all latest versions). With 10,000 Ingresses, after adding an Ingress, the gateway is asked to verify the effective time of creating a route (with a 200 status code); and then after deleting the Ingress, the gateway is asked to verify the effective time of removing the route (with a 404 status code), the advantages of Higress are significant. The advantages of Higress are obvious:

3.2 Simplified HTTPS Certificate Management

The certificate management of ingress has always been a pain point, because for security reasons, the K8s standard specifies that ingress resources can only use the secret under the same namespace, so in the case of business needs to be segregated by namespace, but the domain name is the same, you need to copy the secret to multiple copies of different namespaces, which not only makes the burden of operation and maintenance heavier, but also brings security risks. This not only makes the operation and maintenance burden heavier, but also poses security risks.

Higress can now use a ConfigMap to do global certificate management for Ingress:

apiVersion: v1

kind: ConfigMap

metadata: name: higress-https

namespace: higress-system

data: cert: |cert: |certs: |certs

# Enable automatic global certificate management

automaticHttps: true

# How far in advance to renew certificates when using automatic issuance.

renewBeforeDays: 30

# Configure automatic certificate issuance, only support Let's Encrypt for now.

acmeIssuer: name: letsencrypt

- name: letsencrypt

email: [email protected]

credentialConfig.

# Use Let's Encrypt to issue certificate for foo.com.

- domains: foo.com

- foo.com

tlsIssuer: letsencrypt

tlsSecret: foo-com-secret

# Use a specific secret for matching domains

- domains: statica.example.org

- statica.example.org

- staticb.example.org

tlsSecret: static-example-org-certificate

# Tout a certificate, enable it for domains that do not match the above rule

- domains.

- “*”

tlsSecret: default-certificate

In this way, all the secrets are managed under the higress-system namespace, but can be applied to all the ingresses in the namespace (no need to configure the secret field for the corresponding ingress). This not only reduces the burden of operation and maintenance, but also improves the security of certificate management.

From the above configuration, you can also see that Higress also supports Let's Encrypt for automatic issuance and renewal of HTTPS free certificates without relying on the cert-manager, so Higress can be deployed in K8s scenario or in Standalone mode (via local file configuration ConfigMap). ConfigMap), you can use this capability.

3.3 Providing Cluster Flow Control Plugin

In version 1.4, Higress supports accessing the Redis service from the Wasm plugin. Based on this capability, a community contributor, Han Xiantao (GitHub ID: hanxiantao), implemented the cluster-key-rate-limit plugin based on the original key-rate-limit plugin. This enables precise global rate limiting based on Redis. In addition, the original plug-in only supports the ability to limit the flow of enumerable values, and supports the flow of non-enumerable values, for example, for each IP, each Cookie to calculate the flow limit independently.AI flow-limiting plug-in is also based on the expansion of the implementation.

3.5 Providing Log Observation

Higress' out-of-the-box o11y (Observability) suite adds the ability to collect and analyze logs, and is available through the following Helm install/upgrade commands:

helm repo add higress https://higress.cn/helm-charts

# Install

helm install higress higress/higress --set global.o11y.enabled=true -n higress-system --create-namespace

# Upgrade

helm upgrade higress higress/higress --set global.o11y.enabled=true --reuse-values -n higress-system --create-namespace

Once enabled, you can see the access logs on the Higress console and analyze the logs using Loki:

3.6 Support for Minimal Deployment

Higress can now be started with a Docker container, which makes it easy for individual developers to set up a local learning environment or to build a personal site. The startup method is as follows:

# Create a working directory

mkdir higress; cd higress

# Start higress and write the configuration files to the working directory.

docker run -d --rm --name higress-ai -v ${PWD}:/data \

-p 8001:8001 -p 8080:8080 -p 8443:8443 \

higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one:1.4.1

The listening port descriptions are as follows:

Port 8001: Higress UI console portal

Port 8080: Gateway HTTP protocol portal

Port 8443: Gateway HTTPS protocol portal

All Higress Docker mirrors use their own repositories and are not affected by the unavailability of the Docker Hub in China.

Participate in Higress Community

Welcome more partners to participate in the construction of Higress community, the recent community activities are:

225,000 Prize Pool｜Higress AI Gateway Programming Challenge Launched

GLCC Open Source Summer Camp's Higress Three Topics, all with $6,000 in prizes: https://www.gitlink.org.cn/glcc/2024/projects

To learn more about the community, you can join the Higress WeChat/Spike group (group number: 30735012403).