* Using swish-e to index org files as html :PROPERTIES: :categories: emacs,search :date: 2015/07/03 10:13:11 :updated: 2015/07/03 10:13:11 :END: When we wrote about using swish-e [[http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/][before]], we just indexed the org files as text. This worked pretty well, but we lost some resolution, e.g. being able to search for text in a headline. that is more possible if we index html or xml. So, here we try indexing the org files as html. It will be slower to index because we will filter each org file through a command that exports it to html, but hopefully it will be worth it for the enhanced search capability. We will need a filter shell command that takes an org-file and spits out html. This command is shown as an [[http://kitchingroup.cheme.cmu.edu/blog/2014/08/06/Writing-scripts-in-Emacs-lisp/][emacs-lisp script]] here. This is a pretty bare bones export, and would lack the export of all my custom links from org-ref. I tried this, but org-ref outputs a lot of stuff to stdout when it loads, and unless I can figure out how to suppress that I don't want it here for now. #+BEGIN_SRC emacs-lisp :tangle org2html.el :tangle-mode (identity #o755) :;exec emacs -batch -l $0 -f main "$@" (require 'org) ;(add-to-list 'load-path "/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa") ;(add-to-list 'load-path "/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref") ;(setq package-user-dir "/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa") ;(package-initialize) ;(require 'org-ref) (defun main () (find-file (car command-line-args-left)) (org-html-export-as-html nil nil nil t) (switch-to-buffer "*Org HTML Export*") (print (buffer-string))) ;; Local Variables: ;; mode: emacs-lisp ;; End: #+END_SRC We try it out here: #+BEGIN_SRC sh ./org2html.el index-org-as-html.org #+END_SRC #+RESULTS: #+begin_example "
When we wrote about using swish-e before, we just indexed the org files as text. This worked pretty well, but we lost some resolution, e.g. being able to search for text in a headline. that is more possible if we index html or xml. So, here we try indexing the org files as html. It will be slower to index because we will filter each org file through a command that exports it to html, but hopefully it will be worth it for the enhanced search capability.
We will need a filter shell command that takes an org-file and spits out html. This command is shown as an emacs-lisp script here. This is a pretty bare bones export, and would lack the export of all my custom links
cite:dauenhauer-2006-renew
:;exec emacs -batch -l $0 -f main \"$@\" (require 'org) ;(add-to-list 'load-path \"/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa\") ;(add-to-list 'load-path \"/Users/jkitchin/Dropbox/kitchingroup/jmax/org-ref\") ;(setq package-user-dir \"/Users/jkitchin/Dropbox/kitchingroup/jmax/elpa\") ;(package-initialize) ;(require 'org-ref) (defun main () (find-file (car command-line-args-left)) (org-html-export-as-html nil nil nil t) (switch-to-buffer \"*Org HTML Export*\") (print (buffer-string))) ;; Local Variables: ;; mode: emacs-lisp ;; End:
./org2html.el index-org-as-html.org
# Example configuration file # Tell Swish-e what to directories to index IndexDir /Users/jkitchin/blogofile-jkitchin.github.com # where to save the index IndexFile /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/index.swish-e # What to index IndexOnly .org # Tell Swish-e that .txt files are to use the text parser. IndexContents TXT* .org FileFilter .org /Users/jkitchin/blogofile-jkitchin.github.com/_blog/swish-org/org2html.el # Ask libxml2 to report any parsing errors and warnings or # any UTF-8 to 8859-1 conversion errors ParserWarnLevel 9