407 Commits

Author SHA1 Message Date
abhishek9686 5184f448c5 NM-311: Remove support for the -V installer flag in scripts/nm-quick.sh.
The script now always uses the default LATEST/BRANCH values and no longer
accepts user-specified release overrides.
2026-04-09 14:23:11 +05:30
abhishek9686 59a605d173 NM-311: get metrics interval from settings 2026-04-07 23:25:04 +05:30
Abhishek Kondur 12cc967ba1 Fixes/v1.5.1 (#3938)
* fix(go): set persistent keep alive when registering host using sso;

* fix(go): run posture check violations on delete;

* fix(go): upsert node on approving pending host;

* fix(go): resolve concurrency issues during group delete cleanup;

* fix(go): update doc links;

* fix(go): add created and updated fields to host;

* fix(go): skip delete and update superadmin on sync users;

* fix(go): use conn directly for now;

* fix(go): remove acl for idp groups;

* fix(go): quote fields;

* fix(go): use filters with count;

* feat(go): add a search query;

* fix(go): cleanup acls;

* fix(go): review fixes;

* fix(go): remove additional loop;

* fix(go): fix

* v1.5.1: separate out idp sync and reset signals for HA

* v1.5.1: add grps with name for logging

* v1.5.1: clear posture check violations when all checks are deleted

* v1.5.1: set static when default host

* v1.5.1: fix db status check

* rm set max conns

* v1.5.1: reset auto assigned gw when disconnected

* fix(go): skip global network admin and user groups when splitting;

* v1.5.1: fix update node call from client

* fix(go): separate out migration from normal usage;

* fix(go): skip default groups;

* fix(go): create policies for existing groups on network create;

* fix(go): skip fatal log on clickhouse conn;

* fix(go): add posture check cleanup;

---------

Co-authored-by: VishalDalwadi <dalwadivishal26@gmail.com>
Co-authored-by: Vishal Dalwadi <51291657+VishalDalwadi@users.noreply.github.com>
2026-03-28 01:08:59 +05:30
Abhishek Kondur 292af315dd NM-271: Scalability Improvements (#3921)
* feat(go): add user schema;

* feat(go): migrate to user schema;

* feat(go): add audit fields;

* feat(go): remove unused fields from the network model;

* feat(go): add network schema;

* feat(go): migrate to network schema;

* refactor(go): add comment to clarify migration logic;

* fix(go): test failures;

* fix(go): test failures;

* feat(go): change membership table to store memberships at all scopes;

* feat(go): add schema for access grants;

* feat(go): remove nameservers from new networks table; ensure db passed for schema functions;

* feat(go): set max conns for sqlite to 1;

* fix(go): issues updating user account status;

* NM-236: streamline operations in HA mode

* NM-236: only master pod should subscribe to updates from clients

* refactor(go): remove converters and access grants;

* refactor(go): add json tags in schema models;

* refactor(go): rename file to migrate_v1_6_0.go;

* refactor(go): add user groups and user roles tables; use schema tables;

* refactor(go): inline get and list from schema package;

* refactor(go): inline get network and list users from schema package;

* fix(go): staticcheck issues;

* fix(go): remove test not in use; fix test case;

* fix(go): validate network;

* fix(go): resolve static checks;

* fix(go): new models errors;

* fix(go): test errors;

* fix(go): handle no records;

* fix(go): add validations for user object;

* fix(go): set correct extclient status;

* fix(go): test error;

* feat(go): make schema the base package;

* feat(go): add host schema;

* feat(go): use schema host everywhere;

* feat(go): inline get host, list hosts and delete host;

* feat(go): use non-ptr value;

* feat(go): use save to upsert all fields;

* feat(go): use save to upsert all fields;

* feat(go): save turn endpoint as string;

* feat(go): check for gorm error record not found;

* fix(go): test failures;

* fix(go): update all network fields;

* fix(go): update all network fields;

* feat(go): add paginated list networks api;

* feat(go): add paginated list users api;

* feat(go): add paginated list hosts api;

* feat(go): add pagination to list groups api;

* fix(go): comment;

* fix(go): implement marshal and unmarshal text for custom types;

* fix(go): implement marshal and unmarshal json for custom types;

* fix(go): just use the old model for unmarshalling;

* fix(go): implement marshal and unmarshal json for custom types;

* NM-271:Import swap: compress/gzip replaced with github.com/klauspost/compress/gzip (2-4x faster, wire-compatible output). Added sync import.
Two sync.Pool variables (gzipWriterPool, bufferPool): reuse gzip.Writer and bytes.Buffer across calls instead of allocating fresh ones per publish.
compressPayload rewritten: pulls writer + buffer from pools, resets them, compresses at gzip.BestSpeed (level 1), copies the result out of the pooled buffer, and returns both objects to the pools.

* feat(go): remove paginated list networks api;

* feat(go): use custom paginated response object;

* NM-271: Improve server scalability under high host count

- Replace stdlib compress/gzip with klauspost/compress at BestSpeed and
  pool gzip writers and buffers via sync.Pool to eliminate compression
  as the dominant CPU hotspot.

- Debounce peer update broadcasts with a 500ms resettable window capped
  at 3s max-wait, coalescing rapid-fire PublishPeerUpdate calls into a
  single broadcast cycle.

- Cache HostPeerInfo (batch-refreshed by debounce worker) and
  HostPeerUpdate (stored as side-effect of each publish) so the pull API
  and peer_info API serve from pre-computed maps instead of triggering
  expensive per-host computations under thundering herd conditions.

- Warm both caches synchronously at startup before the first publish
  cycle so early pull requests are served instantly.

- Bound concurrent MQTT publishes to 5 via semaphore to prevent
  broker TCP buffer overflows that caused broken pipe disconnects.

- Remove manual Disconnect+SetupMQTT from ConnectionLostHandler and
  rely on the paho client's built-in AutoReconnect; add a 5s retry
  wait in publish() to ride out brief reconnection windows.

* NM-271: Reduce server CPU contention under high concurrent load

- Cache ServerSettings with atomic.Value to eliminate repeated DB reads
  on every pull request (was 32+ goroutines blocked on read lock)
- Batch UpdateNodeCheckin writes in memory, flush every 30s to reduce
  per-checkin write lock contention (was 88+ goroutines blocked)
- Enable SQLite WAL mode + busy_timeout and remove global dbMutex;
  let SQLite handle concurrency natively (reads no longer block writes)
- Move ResetFailedOverPeer/ResetAutoRelayedPeer to async in pull()
  handler since results don't affect the cached response
- Skip no-op UpsertNode writes in failover/relay reset functions
  (early return when node has no failover/relay state)
- Remove CheckHostPorts from hostUpdateFallback hot path
- Switch to pure-Go SQLite driver (glebarez/sqlite), set CGO_ENABLED=0

* fix(go): ensure default values for page and per_page are used when not passed;

* fix(go): rename v1.6.0 to v1.5.1;

* fix(go): check for gorm.ErrRecordNotFound instead of database.IsEmptyRecord;

* fix(go): use host id, not pending host id;

* NM-271: Revert pure-Go SQLite and FIPS disable to verify impact

Revert to CGO-based mattn/go-sqlite3 driver and re-enable FIPS to
isolate whether these changes are still needed now that the global
dbMutex has been removed and WAL mode is enabled. Keep WAL mode
pragma with mattn-compatible DSN format.

* feat(go): add filters to paginated apis;

* feat(go): add filters to paginated apis;

* feat(go): remove check for max username length;

* feat(go): add filters to count as well;

* feat(go): use library to check email address validity;

* feat(go): ignore pagination if params not passed;

* fix(go): pagination issues;

* fix(go): check exists before using;

* fix(go): remove debug log;

* NM-271: rm debug logs

* NM-271: check if caching is enabled

* NM-271: add server sync mq topic for HA mode

* NM-271: fix build

* NM-271: push metrics in batch to exproter over api

* NM-271: use basic auth for exporter metrics api

* fix(go): use gorm err record not found;

* NM-271: Add monitoring stack on demand

* NM-271: -m arg for install script should only add monitoring stack

* fix(go): use gorm err record not found;

* NM-271: update docker compose file for prometheus

* NM-271: update docker compose file for prometheus

* fix(go): use user principal name when creating pending user;

* fix(go): use schema package for consts;

* NM-236: rm duplicate network hook

* NM-271: add server topic to reset idp hooks on master node

* fix(go): prevent disabling superadmin user;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): swap is admin and is superadmin;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): remove dead code block;

https://github.com/gravitl/netmaker/pull/3910#discussion_r2928837937

* fix(go): incorrect message when trying to disable self;

https://github.com/gravitl/netmaker/pull/3910#discussion_r2928837934

* NM-271: fix stale peers on reset_failovered pull and add HTTP timeout to metrics exporter

Run the failover/relay reset synchronously in the pull handler so the
response reflects post-reset topology instead of serving stale cached
peers. Add a 30s timeout to the metrics exporter HTTP client to prevent
PushAllMetricsToExporter from blocking the Keepalive loop.

* NM-271: fix gzip pool corruption, MQTT topic mismatch, stale settings cache, and reduce redundant DB fetches

- Only return gzip.Writer to pool after successful Close to prevent
  silently malformed MQTT payloads from a previously errored writer.
- Fix serversync subscription to exact topic match since syncType is
  now in the message payload, not the topic path.
- Prevent zero-value ServerSettings from being cached indefinitely
  when the DB record is missing or unmarshal fails on startup.
- Return fetched hosts/nodes from RefreshHostPeerInfoCache so
  warmPeerCaches reuses them instead of querying the DB twice.
- Compute fresh HostPeerUpdate on reset_failovered pull instead of
  serving stale cache, and store result back for subsequent requests.

* NM-271: fix gzip writer pool leak, log checkin flush errors, and fix master pod ordinal parsing

- Reset gzip.Writer to io.Discard before returning to pool so errored
  writers are never leaked or silently reused with corrupt state.
- Track and log failed DB inserts in FlushNodeCheckins so operators
  have visibility when check-in timestamps are lost.
- Parse StatefulSet pod ordinal as integer instead of using HasSuffix
  to prevent netmaker-10 from being misidentified as master pod.

* NM-271: simplify masterpod logic

* fix(go): use correct header;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): return after error response;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): use correct order of params;

https://github.com/gravitl/netmaker/pull/3910#discussion_r2929593036

* fix(go): set default values for page and page size; use v2 instead of /list;

* NM-271: use host name

* Update mq/serversync.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* NM-271: fix duplicate serversynce case

* NM-271: streamline gw updates

* Update logic/auth.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* Update schema/user_roles.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): syntax error;

* fix(go): set default values when page and per_page are not passed or 0;

* fix(go): use uuid.parse instead of uuid.must parse;

* fix(go): review errors;

* fix(go): review errors;

* Update controllers/user.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* Update controllers/user.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* NM-163: fix errors:

* Update db/types/options.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): persist return user in event;

* Update db/types/options.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* NM-271: signal pull on ip changes

* NM-163: duplicate lines of code

* NM-163: fix(go): fix missing return and filter parsing in user controller

- Add missing return after error response in updateUserAccountStatus
  to prevent double-response and spurious ext-client side-effects
- Use switch statements in listUsers to skip unrecognized
  account_status and mfa_status filter values

* NM-271: signal pull req on node ip change

* fix(go): check for both min and max page size;

* NM-271: refresh node object before update

* fix(go): enclose transfer superadmin in transaction;

* fix(go): review errors;

* fix(go): remove free tier checks;

* fix(go): review fixes;

* NM-271: streamline ip pool ops

* NM-271: fix tests, set max idle conns

* NM-271: fix(go): fix data races in settings cache and peer update worker

- Use pointer type in atomic.Value for serverSettingsCache to avoid
  replacing the variable non-atomically in InvalidateServerSettingsCache
- Swap peerUpdateReplace flag before draining the channel to prevent
  a concurrent replacePeers=true from being consumed by the wrong cycle

---------

Co-authored-by: VishalDalwadi <dalwadivishal26@gmail.com>
Co-authored-by: Vishal Dalwadi <51291657+VishalDalwadi@users.noreply.github.com>
Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>
2026-03-18 00:24:54 +05:30
Abhishek Kondur edda2868fc NM-163: Users, Groups, Roles, Networks and Hosts Table Migration (#3910)
* feat(go): add user schema;

* feat(go): migrate to user schema;

* feat(go): add audit fields;

* feat(go): remove unused fields from the network model;

* feat(go): add network schema;

* feat(go): migrate to network schema;

* refactor(go): add comment to clarify migration logic;

* fix(go): test failures;

* fix(go): test failures;

* feat(go): change membership table to store memberships at all scopes;

* feat(go): add schema for access grants;

* feat(go): remove nameservers from new networks table; ensure db passed for schema functions;

* feat(go): set max conns for sqlite to 1;

* fix(go): issues updating user account status;

* refactor(go): remove converters and access grants;

* refactor(go): add json tags in schema models;

* refactor(go): rename file to migrate_v1_6_0.go;

* refactor(go): add user groups and user roles tables; use schema tables;

* refactor(go): inline get and list from schema package;

* refactor(go): inline get network and list users from schema package;

* fix(go): staticcheck issues;

* fix(go): remove test not in use; fix test case;

* fix(go): validate network;

* fix(go): resolve static checks;

* fix(go): new models errors;

* fix(go): test errors;

* fix(go): handle no records;

* fix(go): add validations for user object;

* fix(go): set correct extclient status;

* fix(go): test error;

* feat(go): make schema the base package;

* feat(go): add host schema;

* feat(go): use schema host everywhere;

* feat(go): inline get host, list hosts and delete host;

* feat(go): use non-ptr value;

* feat(go): use save to upsert all fields;

* feat(go): use save to upsert all fields;

* feat(go): save turn endpoint as string;

* feat(go): check for gorm error record not found;

* fix(go): test failures;

* fix(go): update all network fields;

* fix(go): update all network fields;

* feat(go): add paginated list networks api;

* feat(go): add paginated list users api;

* feat(go): add paginated list hosts api;

* feat(go): add pagination to list groups api;

* fix(go): comment;

* fix(go): implement marshal and unmarshal text for custom types;

* fix(go): implement marshal and unmarshal json for custom types;

* fix(go): just use the old model for unmarshalling;

* fix(go): implement marshal and unmarshal json for custom types;

* feat(go): remove paginated list networks api;

* feat(go): use custom paginated response object;

* fix(go): ensure default values for page and per_page are used when not passed;

* fix(go): rename v1.6.0 to v1.5.1;

* fix(go): check for gorm.ErrRecordNotFound instead of database.IsEmptyRecord;

* fix(go): use host id, not pending host id;

* feat(go): add filters to paginated apis;

* feat(go): add filters to paginated apis;

* feat(go): remove check for max username length;

* feat(go): add filters to count as well;

* feat(go): use library to check email address validity;

* feat(go): ignore pagination if params not passed;

* fix(go): pagination issues;

* fix(go): check exists before using;

* fix(go): remove debug log;

* fix(go): use gorm err record not found;

* fix(go): use gorm err record not found;

* fix(go): use user principal name when creating pending user;

* fix(go): use schema package for consts;

* fix(go): prevent disabling superadmin user;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): swap is admin and is superadmin;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): remove dead code block;

https://github.com/gravitl/netmaker/pull/3910#discussion_r2928837937

* fix(go): incorrect message when trying to disable self;

https://github.com/gravitl/netmaker/pull/3910#discussion_r2928837934

* fix(go): use correct header;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): return after error response;

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): use correct order of params;

https://github.com/gravitl/netmaker/pull/3910#discussion_r2929593036

* fix(go): set default values for page and page size; use v2 instead of /list;

* Update logic/auth.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* Update schema/user_roles.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): syntax error;

* fix(go): set default values when page and per_page are not passed or 0;

* fix(go): use uuid.parse instead of uuid.must parse;

* fix(go): review errors;

* fix(go): review errors;

* Update controllers/user.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* Update controllers/user.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* NM-163: fix errors:

* Update db/types/options.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* fix(go): persist return user in event;

* Update db/types/options.go

Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>

* NM-163: duplicate lines of code

* NM-163: fix(go): fix missing return and filter parsing in user controller

- Add missing return after error response in updateUserAccountStatus
  to prevent double-response and spurious ext-client side-effects
- Use switch statements in listUsers to skip unrecognized
  account_status and mfa_status filter values

* fix(go): check for both min and max page size;

* fix(go): enclose transfer superadmin in transaction;

* fix(go): review errors;

* fix(go): remove free tier checks;

* fix(go): review fixes;

---------

Co-authored-by: VishalDalwadi <dalwadivishal26@gmail.com>
Co-authored-by: Vishal Dalwadi <51291657+VishalDalwadi@users.noreply.github.com>
Co-authored-by: tenki-reviewer[bot] <262613592+tenki-reviewer[bot]@users.noreply.github.com>
2026-03-17 19:36:52 +05:30
Vishal Dalwadi 73949b39dd Set location if not set on host (#3859)
* fix(go): set location if not set on host;

* fix(go): set location if not set on host;

* fix(go): set location if not set on host;

* fix(go): set location if not set on host;
2026-02-10 19:25:28 +04:00
Vishal Dalwadi f723fc5202 NM-214: Expect GeoInfo to come from Netclient (#3833)
* feat(go): expect geoinfo from netclient;

* feat(go): add geoinfo util;
2026-02-02 14:14:49 +04:00
Vishal Dalwadi a4981ffd26 NM-168: Network Flow Logs (#3754)
* feat(go): define flow events;

* feat(go): improve structure;

* feat(go): improve structure;

* feat(go): remove old flow definitions;

* feat(sql): add clickhouse init scripts;

* feat(sql): add protobuf spec;

* fix(sql): store ip as string;

* feat(go): move proto def to grpc dir;

* feat(go): use node instead of host as type; optimize protobuf defs;

* feat(go): add clickhouse db support; add endpoint to query flows;

* fix(go): fix clickhouse config;

* fix(go): use error response structure to report error;

* feat(go): pass flow logging status to netclient;

* feat(go): add peer ip identity map to host peer info;

* feat(go): remove prefix from participant obj fields;

* feat(go): add flow logs enabled field to host;

* feat(go): add filtering to get flow api;

* feat(go): fix record struct;

* feat(go): add exporter url to server config;

* feat(go): add exporter url to server config;

* feat(go): enable flow logs by default;

* feat(go): update nm-quick.sh;

* feat(go): update nm-quick.sh;

* feat(go): update nm-quick.sh;

* feat(go): update nm-quick.sh;

* feat(go): add db initialization logic;

* feat(go): filter by network id;

* fix(go): connection issue;

* fix(go): connection issue;

* fix(go): golang builder version;

* feat(go): add server settings for flow logs;

* feat(go): initialize clickhouse in pro; check for retention;

* feat(go): add exporter feature flags;

* feat(go): add grpc behind caddy;

* feat(go): expose ports correctly;

* fix(go): grpc caddyfile config;

* fix(go): publish exporter feature flags on license validation;

* fix(go): set server name for netmaker exporter;

* fix(go): set server name for netmaker exporter;

* fix(go): check for nil cancel func;

* fix(go): add flow logs field to api host;

* fix(go): add flow logs field to api host;

* fix(go): remove port from grpc setting;

* chore(go): tabs;

* feat(go): introduce egress range participant type;.

* feat(go): rename egress range to egress route for uniform language;

* feat(go): rename egress range to egress route for uniform language;

* feat: add peer addr identity map to host peer update;

* feat: add address identity map to host peer update;

* feat: add address identity map to host peer update;

* feat: set correct from and to args;

* feat: add support for filtering by node;

* feat: use corresponding base image;

* feat: update dockerfile base image version;

* fix: disable flow logs for all host when global settings are changed;
2025-12-12 14:12:00 +04:00
Abhishek Kondur eed32cd2d6 Merge pull request #3735 from gravitl/NM-166
NM-166: Device Posture Checks
2025-12-05 10:33:11 +04:00
Vishal Dalwadi cf6ad4726b Merge pull request #3739 from gravitl/fix/dns
Unique DNS Subscriptions
2025-11-21 12:17:51 +04:00
Abhishek K 74fef9fbc6 NM-122: Auto Relay, auto assignment of Gw (#3697)
* add auto realy handlers and logic funcs

* add pro func connectors

* Add auto relayed peer ips on peer update, set auto relay on gw creation

* add network id to signal, add autorelay nodes to peerudpate

* add autorelay peer update logic

* add nodes to peer update

* revert node model change

* reset auto relayed peers on the relay node on reset, add auto relay nodes to pull

* add logic api to update auto relay node

* add autoassigngw field to node, add logic to swith relay node in relayme udpate api

* add gw nodes to pull

* intilaise gw map

* HA relay functionality

* add autoassign gw option to enrollment key

* publish intant action to auto assign gw

* fix static checks

* unset relay if auto assign removed

* add host node model to auto relay info

* add host node model to auto relay info

* only use hostNode model for gws info

* handle autoassigned gw peer in the update

* handle autoassigned gw peer in the update

* handle peer updates for autoassigned gw peer

* unset auto assigned peer if relayed or failedovered
2025-10-28 09:53:31 +04:00
Abhishek K c5b48db2a1 NM-125: Egress HA by Latency, Allow Tags to be selected as routing peers (#3698)
* enable egress routing peers with tags

* remove tag from egress when deleted

* fix egress tag functionality

* filter duplicate egress ips

* set default stun server if unset

* add version to status api

* sync deleted node udpate host deletion
2025-10-25 23:49:21 +04:00
Abhishek K 49e28e3385 NM-137: Add addtional mq actions to host api (#3671)
* add host node update action

* add peer signal action to fallback api

* add replace peers to host pull

* add delete host action to fallback api

* update base go builder image

* update go builder tag

* check host port to avoid conflicts behind NAT

* fix connect/disconnect on api

* send pull signal on disconnect from UI

* fix panic on host join via user auth

* reset failover on disconnect
2025-10-07 13:16:31 +04:00
Abhishek K 9e0196126f NM-79: Domain Based Egress Routing (#3607)
* add support for egress domain routing

* add domain info to egress range

* fix egress domain update

* send peer update domain resolution update

* add egress domain update in the peer update

* use range field for domain check

* add egress domain to host pull

* add egress domain model to egress host update

* add egress domain model to egress host update

* update egress domain model on acls

* add check of range if domain is set

* sync egress domains to dns system

* add egress domain to match domain list, fix egress nat rule for domains

* fix all rsrcs comms

* fix static checks

* fix egress acls on CE

* check for all resources access on a node

* simplify egress acl rules

* merged ce and pro acl rule func

* fix uni direction acl rule for static nodes

* allow relayed nodes traffic

* resolve merge conflicts

* remove anywhere dst rule on user node acls

* fix: broadcast  user groups update for acl changes

* add egress domain ans routes to nodes

* add egress ranges to DST

* add all egress ranges for all resources

* fix DNS routing acls rules
2025-09-11 15:24:17 +05:30
abhishek9686 7688bc3ebc resolve merge conflicts 2025-08-29 11:37:27 +05:30
Abhishek K 461c680099 NM-15: sync device interfaces on checkin (#3548)
* sync devices interface on checkin

* deep compare ifaces on checkin
2025-07-27 08:29:37 +05:30
abhishek9686 e65f6728d4 set location if empty on checkin 2025-07-04 00:43:08 +05:30
Abhishek K aca911712b avoid setting nil endpoint if peer using internet gw (#3529) 2025-06-25 19:17:57 +05:30
abhishek9686 d978de08d0 collect host localtion for graph 2025-06-12 15:47:24 +05:30
Abhishek K 309e4795a1 NET-1950: Persist Server Settings in the DB (#3419)
* feat: api access tokens

* revoke all user tokens

* redefine access token api routes, add auto egress option to enrollment keys

* add server settings apis, add db table for settigs

* handle server settings updates

* switch to using settings from DB

* fix sever settings migration

* revet force migration for settings

* fix server settings database write

* fix revoked tokens to be unauthorized

* remove unused functions

* convert access token to sql schema

* switch access token to sql schema

* fix merge conflicts

* fix server settings types

* bypass basic auth setting for super admin

* add TODO comment

* publish peer update on settings update

* chore(go): import style changes from migration branch;

1. Singular file names for table schema.
2. No table name method.
3. Use .Model instead of .Table.
4. No unnecessary tagging.

* remove nat check on egress gateway request

* Revert "remove nat check on egress gateway request"

This reverts commit 0aff12a189.

* feat(go): add db middleware;

* feat(go): restore method;

* feat(go): add user access token schema;

* fix user auth api:

* re initalise oauth and email config

* set verbosity

* sync auto update settings with hosts

* sync auto update settings with hosts

* mask secret and convert jwt duration to minutes

* convert jwt duration to minutes

* notify peers after settings update

* compare with curr settings before updating

* send host update to devices on auto update

---------

Co-authored-by: Vishal Dalwadi <dalwadivishal26@gmail.com>
2025-04-30 02:34:10 +04:00
Abhishek K e13bf2c0eb NET-1923: Add Metric Port to server config (#3306)
* set default metrics port 8889

* set default metrics port 51821

* add metrics port to server config

* bind caddy only on tcp

* add var for pulling files

* add new line

* update peer update model

* check if port is not zero

* set replace peer to false on pull

* do not replace peers on failover sync

* remove debug log

* add old peer update fields for backwards compatibility

* add old json tag

* add debug log in caller trace func
2025-02-04 08:44:24 +04:00
Yabin Ma 9024aead60 add back compatibility for encrypt message (#3246) 2024-12-10 12:47:05 +04:00
Abhishek K 31c2311bef NET-1782: Fetch Node Connection Status from metrics (#3237)
* add live status of node

* handle static node status

* add public IP field to server configuration

* get public Ip from config

* improve node status logic

* improvise status check

* use only checkin status on old nodes

---------

Co-authored-by: the_aceix <aceixsmartx@gmail.com>
2024-12-10 10:46:05 +04:00
Yabin Ma 5f21c8bb1d NET-1778: scale test code changes (#3203)
* comment ACL call and add debug message

* add cache for network nodes

* fix load node to network cache issue

* add peerUpdate call 1 min limit

* add debug log for scale test

* release maps

* avoid default policy for node

* 1 min limit for peerUpdate trigger

* mq options

* Revert "mq options"

This reverts commit 10b93d0118.

* set peerUpdate run in sequence

* update for emqx 5.8.2

* remove batch peer update

* change the sleep to 10 millisec to avoid timeout

* add compress and change encrypt for peerUpdate message

* add mem profiling and automaxprocs

* add failover ctx mutex

* ignore request to failover peer

* remove code without called

* remove debug logs

* update emqx to v5.8.2

* change broker keepalive

* add OLD_ACL_SUPPORT setting

* add host version check for message encrypt

* remove debug message

* remove peerUpdate call control

---------

Co-authored-by: abhishek9686 <abhi281342@gmail.com>
2024-12-10 10:15:31 +04:00
Max Ma 5c15f3d9eb NET-1603: Manage DNS NM changes (#3124)
* add switch for manage dns

* manage DNS sync publish

* add dns sync api

* add manageDNS field in peerUpdate

* add default dns for extClent if manage dns enabled

* add DEFAULT_DOMAIN for internal DNS lookup

* move DNSSync to peerUpdate

* fix empty host in network issue

* sync up dns when custom dns add/delete

* fix custom DNS ip4/ipv6 validator issue
2024-10-29 13:53:45 +04:00
Abhishek K 5a561b3835 Net 1440 batchpeerupdate (#3042)
* NET-1440 scale test changes

* fix UT error and add error info

* load metric data into cacha in startup

* remove debug info for metric

* add server telemetry and hasSuperAdmin to cache

* fix user UT case

* update sqlite connection string for performance

* update check-in TS in cache only if cache enabled

* update metric data in cache only if cache enabled and write to DB once in stop

* update server status in mq topic

* add failover existed to server status update

* only send mq messsage when there is server status change

* batch peerUpdate

* code changes for scale for review

* update UT case

* update mq client check

* mq connection code change

* revert server status update changes

* revert batch peerUpdate

* remove server status update info

* batch peerUpdate

* code changes based on review and setupmqtt in keepalive

* set the mq message order to false for PIN

* remove setupmqtt in keepalive

* add peerUpdate batch size to config

* update batch peerUpdate

* recycle ip in node deletion

* update ip allocation logic

* remove ip addr cap

* remove ippool file

* update get extClient func

* remove ip from cache map when extClient is removed

* add batch peerUpdate switch

* set batch peerUpdate to true by default

---------

Co-authored-by: Max Ma <mayabin@gmail.com>
2024-08-16 15:35:43 +05:30
Max Ma 46b8fd21c8 NET-1440: scale test changes (#3014)
* NET-1440 scale test changes

* fix UT error and add error info

* load metric data into cacha in startup

* remove debug info for metric

* add server telemetry and hasSuperAdmin to cache

* fix user UT case

* update sqlite connection string for performance

* update check-in TS in cache only if cache enabled

* update metric data in cache only if cache enabled and write to DB once in stop

* update server status in mq topic

* add failover existed to server status update

* only send mq messsage when there is server status change

* batch peerUpdate

* code changes for scale for review

* update UT case

* update mq client check

* mq connection code change

* revert server status update changes

* revert batch peerUpdate

* remove server status update info

* code changes based on review and setupmqtt in keepalive

* set the mq message order to false for PIN

* remove setupmqtt in keepalive

* recycle ip in node deletion

* update ip allocation logic

* remove ip addr cap

* remove ippool file

* update get extClient func

* remove ip from cache map when extClient is removed
2024-08-15 11:59:00 +05:30
Max Ma 65faf73fe9 NET-1226: Scalability Improvements (#2987)
* add api to check if failover node existed

* remove 5 minute peerUpdate

* update peerUpdate to trigger pull

* update Action name to SignalPull

* revert the peerUpdate from SignalPull

* fix getfailover error issue

* rm acls creation for on-prem emqx

* remove use of acls

* add additional broker status field on status api

* NET-1165: Remove creation of acls on emqx (#2996)

* rm acls creation for on-prem emqx

* remove use of acls

* add additional broker status field on status api

* comment out mq reconnect logic

* configure mq conn params

* add metric_interval in ENV for publishing metrics

* add metric_interval in ENV for publishing metrics

* update PUBLISH_METRIC_INTERVAL env name

* revert the mq setttings back

* fix error nil issue

---------

Co-authored-by: abhishek9686 <abhi281342@gmail.com>
Co-authored-by: Abhishek K <32607604+abhishek9686@users.noreply.github.com>
2024-07-09 18:56:55 +05:30
Abhishek K 7ff30599ed NET-1252: Restrict inetGws, Relays from getting failedOver (#2937)
* add additional checks to avoid failovers

* add failover defence check on signal handler

* only add check for victim node

* avoid failover reset on pull

* add relayed for failoverme

* misc changes for failover

* remove resetfailoverpeers for InetNode

* add egress route back to allowedip list if relayed is egressGW

* add extclient back to allowedip list if peer is ingressGW

* reset failover on pull

---------

Co-authored-by: Max Ma <mayabin@gmail.com>
2024-06-03 10:49:02 +04:00
abhishek9686 72577a96d9 replace peer on first peer update 2024-05-17 12:02:47 +05:30
abhishek9686 3ed183344b send peer update on startup 2024-05-17 11:28:53 +05:30
Max Ma 86fac41868 fix metric exporter mq connection issue (#2925) 2024-05-08 12:13:11 +05:30
guangwu 4a2e2190fc fix: close resp body (#2909) 2024-04-30 09:13:08 +05:30
abhishek9686 5ff9289462 set random id 2024-04-16 10:50:31 +05:30
Max Ma 961f8eab6e NET-1119 (#2886)
* exclude IngressGW in failover

* resetfailoverpeer when adding IngressGw if failover enabled

* exclude InetGW in failover

* get egress ranges of failedover peer

---------

Co-authored-by: abhishek9686 <abhi281342@gmail.com>
2024-04-12 18:22:03 +05:30
Abhishek K 66069fbc34 NET-1082: Scale Testing Fixes (#2894)
* add additional mutex lock on node acls func

* increase verbosity

* disable acls on cloud emqx

* add emqx creds creation to go routine

* add debug log of mq client id

* comment port check

* uncomment port check

* check for connection mq connection open

* use username for client id

* add write mutex on acl is allowed

* add mq connection lost handler on server

* spin off zombie init as go routine

* get whole api path from config

* Revert "get whole api path from config"

This reverts commit 392f5f4c5f.

* update extclient acls async

* add additional mutex lock on node acls func

(cherry picked from commit 5325f0e7d7)

* increase verbosity

(cherry picked from commit 705b3cf0bf)

* add emqx creds creation to go routine

(cherry picked from commit c8e65f4820)

* add debug log of mq client id

(cherry picked from commit 29c5d6ceca)

* comment port check

(cherry picked from commit db8d6d95ea)

* check for connection mq connection open

(cherry picked from commit 13b11033b0)

* use username for client id

(cherry picked from commit e90c7386de)

* add write mutex on acl is allowed

(cherry picked from commit 4cae1b0bb4)

* add mq connection lost handler on server

(cherry picked from commit c82918ad35)

* spin off zombie init as go routine

(cherry picked from commit 6d65c44c43)

* update extclient acls async

(cherry picked from commit 6557ef1ebe)

* additionl logs for oauth user flow

(cherry picked from commit 61703038ae)

* add more debug logs

(cherry picked from commit 5980beacd1)

* add more debug logs

(cherry picked from commit 4d001f0d27)

* add set auth secret

(cherry picked from commit f41cef5da5)

* fix fetch pass

(cherry picked from commit 825caf4b60)

* make sure auth secret is set only once

(cherry picked from commit ba33ed02aa)

* make sure auth secret is set only once

(cherry picked from commit 920ac4c507)

* comment usage of emqx acls

* replace  read lock with write lock on acls

* replace  read lock with write lock on acls

(cherry picked from commit 808d2135c8)

* use deadlock pkg for visibility

* add additional mutex locks

* remove race flag

* on mq re-connecting donot exit if failed

* on mq re-connecting donot exit if failed

* revert mutex package change

* set mq clean session

* remove debug log

* go mod tidy

* revert on prem emqx acls del
2024-04-11 21:18:57 +05:30
Max Ma 5740c3e009 Net 1115 (#2890)
* add endpointipv6 for host

* keep endpointipv6 unchanged when enable static endpoint

* handle ipv6 endpoint updates

---------

Co-authored-by: abhishek9686 <abhi281342@gmail.com>
2024-04-11 17:37:45 +05:30
Abhishek K c7e673fb9f ACC-532: set mq clean session to true (#2865)
* set clean session

* delete emqx hosts creds api

* add emqx hosts del api to limited middleware controller

* add emqx hosts del api to limited middleware controller

* remove server creds from emqx
2024-03-20 15:03:41 +07:00
abhishek9686 a5dff67bfd delete server user on start up to re-initialize 2024-03-05 11:48:24 +07:00
abhishek9686 01592b2ecc add additional logging 2024-02-29 19:58:18 +07:00
abhishek9686 4630925182 send pull syn over old mq for emqx migration 2024-02-29 13:30:51 +07:00
abhishek9686 eb28faf669 add emqx migration func 2024-02-28 17:57:25 +07:00
abhishek9686 b5db86d1a7 format emqx urls 2024-01-26 10:45:55 +05:30
abhishek9686 fb0fead2f0 create emqx for server, get app creds from env 2024-01-26 10:24:29 +05:30
abhishek9686 b40ce30af5 implement node acls cloud api func 2024-01-26 00:19:29 +05:30
abhishek9686 6d3de8d565 node acls 2024-01-25 22:27:57 +05:30
abhishek9686 c556e9d77d add host acls and create user cloud emqx funcs 2024-01-25 18:24:43 +05:30
abhishek9686 155f2887b2 implement emqx interface methods for cloud and on-prem 2024-01-25 15:11:16 +05:30
Abhishek K 465f2bd5be NET-896: Scale test bug fixes (#2764)
* send peer update in async

* update metrics on fallback

* return http json response
2024-01-15 23:17:36 +05:30
Abhishek K 5bf30b2c10 NET-877: Replace peers on Refreshkeys peer update (#2761)
* replace peers on key refresh

* add peer conf to metrics map only when allowed
2024-01-11 15:59:19 +05:30