commit | 9b867fb6188e91f3b2cd04921acc919f648a4152 | [log] [tgz] |
---|---|---|
author | Stefan Hengl <stefan.hengl@gmail.com> | Fri Oct 23 17:20:20 2020 +0200 |
committer | Han-Wen Nienhuys <hanwen@google.com> | Thu Nov 19 16:23:21 2020 +0000 |
tree | 8c7966fc6997f1a4695ba66798001659a47fbdc1 | |
parent | 1539ef83faa632f08d947a67e2c017dc049e1813 [diff] |
Use newline index for regexp content searches For content searches trying to match multiple terms on the same line, we check whether the matches of the individual terms intersect before calling the regex engine. If they don't intersect, we skip the document. This optimisation is useful whenever terms of the query appear often in the same document but rarely on the same line. The table below shows latencies of search queries of master vs. this change on a set of medium-sized repos. All latencies are reported in ms. +-------------------------+-------------+-----------------+-----------+ | Query | Master (ms) | Patchset 7 (ms) | Delta (%) | +-------------------------+-------------+-----------------+-----------+ | (func).*?(rtoorg) | 21.43 | 2.67 | 87.56 | +-------------------------+-------------+-----------------+-----------+ | (func).*?(bool).*?(bar) | 723.60 | 170.1 | 76.49 | +-------------------------+-------------+-----------------+-----------+ | (func).*?(bool).*? | 1320 | 186.2 | 85.89 | | (int32).*?(error) | | | | +-------------------------+-------------+-----------------+-----------+ | (func).*?(return) | 236.97 | 73.73 | 68.88 | +-------------------------+-------------+-----------------+-----------+ | (config).*?(override) | 827.90 | 328.67 | 60.30 | +-------------------------+-------------+-----------------+-----------+ Change-Id: Ic4e4c3e7a4ad61ec56efada35a70aa14d0766ff7
"Zoekt, en gij zult spinazie eten" - Jan Eertink ("seek, and ye shall eat spinach" - My primary school teacher)
This is a fast text search engine, intended for use with source code. (Pronunciation: roughly as you would pronounce “zooked” in English)
Downloading:
go get github.com/google/zoekt/
Indexing:
go install github.com/google/zoekt/cmd/zoekt-index $GOPATH/bin/zoekt-index .
Searching
go install github.com/google/zoekt/cmd/zoekt $GOPATH/bin/zoekt 'ngram f:READ'
Indexing git repositories:
go install github.com/google/zoekt/cmd/zoekt-git-index $GOPATH/bin/zoekt-git-index -branches master,stable-1.4 -prefix origin/ .
Indexing repo repositories:
go install github.com/google/zoekt/cmd/zoekt-{repo-index,mirror-gitiles} zoekt-mirror-gitiles -dest ~/repos/ https://gfiber.googlesource.com zoekt-repo-index \ -name gfiber \ -base_url https://gfiber.googlesource.com/ \ -manifest_repo ~/repos/gfiber.googlesource.com/manifests.git \ -repo_cache ~/repos \ -manifest_rev_prefix=refs/heads/ --rev_prefix= \ master:default_unrestricted.xml
Starting the web interface
go install github.com/google/zoekt/cmd/zoekt-webserver $GOPATH/bin/zoekt-webserver -listen :6070
A more organized installation on a Linux server should use a systemd unit file, eg.
[Unit] Description=zoekt webserver [Service] ExecStart=/zoekt/bin/zoekt-webserver -index /zoekt/index -listen :443 --ssl_cert /zoekt/etc/cert.pem --ssl_key /zoekt/etc/key.pem Restart=always [Install] WantedBy=default.target
Zoekt comes with a small service management program:
go install github.com/google/zoekt/cmd/zoekt-indexserver cat << EOF > config.json [{"GithubUser": "username"}, {"GithubOrg": "org"}, {"GitilesURL": "https://gerrit.googlesource.com", "Name": "zoekt" } ] EOF $GOPATH/bin/zoekt-server -mirror_config config.json
This will mirror all repos under ‘github.com/username’, ‘github.com/org’, as well as the ‘zoekt’ repository. It will index the repositories.
It takes care of fetching and indexing new data and cleaning up logfiles.
The webserver can be started from a standard service management framework, such as systemd.
It is recommended to install Universal ctags to improve ranking. See here for more information.
Thanks to Alexander Neubeck for coming up with this idea, and helping me flesh it out.
This is not an official Google product