General computing skills


Git version control

Version control in Windows

  • TortoiseGIT: for Windows users, this applications add the possibility of managing version control and other features directly from the file explorer.

Adding submodules

Generally speaking adding a submodule to a repository should be a simple matter of:

git submodule add https://<path>/<to>/<repository>

Nonetheless this might fail, especially for large sized repositories; I faced this issue which I tried to fix by increasing buffer size as reported in the link. This solved the issue but led me to another problem which could be solved by degrading HTTP protocol.


Working on Windows

Creating a portable launcher

A simple way to create a portable launcher requiring to source extra variables is by writing a simple batch script exporting or calling another script with the definitions:

@echo off

@REM Add variables to be sourced here such as
@REM set PATH="/path/to/some/dir";%PATH%
@REM ... or call another shared script doing so.
@REM call %~dp0\env

MyCode.exe

Because a batch script will keep a console window open, create a VB file with the following

Set oShell = CreateObject ("Wscript.Shell") 
Dim strArgs
strArgs = "cmd /c MyCode.bat"
oShell.Run strArgs, 0, false

In the example we assume the program is called MyCode.exe and the batch script has been named in an analogous way MyCode.bat. Some real world examples are provided here.

Mount a network drive in WSL

Here we assume we will mount drive Z: at /mnt/z:

# Create the mount point (if required):
sudo mkdir /mnt/z

# Mount the network drive in WSL:
sudo mount -t drvfs Z: /mnt/z

Following writing to a file

This is equivalent to Linux tail -f <file-path>:

Get-Content -Path "<file-path>" -Wait

Working on Linux

Several recent Linux distributions use Gnome 3 as the default desktop manager. A few innovations introduced by this environment are not really interesting and falling back to classical modes is useful:


Regular expressions

Regular expressions (or simply regex) processing is a must-have skill for anyone doing scientific computing. Most programs produce results or logs in plain text and do not support specific data extraction from those. There regex becomes your best friend. Unfortunately during the years many flavors of regex appeared, each claiming to offer advantages or to be more formal than its predecessors. Due to this, learning regex is often language-specific (most of the time you create and process regex from your favorite language) and sometimes even package-specific. Needless to say, regex may be more difficult to master than assembly programming.

  • Useful web applications can be found in regex101 and regexr.

  • Match all characters between two strings with lookbehind and look ahead patterns. Notice that this will require the enclosing strings to be fixed (at least under PCRE). For processing WallyTutor.jl documentation I have used a more generic approach but less general than what is proposed here.

  • Match any character across multiple lines with (.|\n)*.

  • Currently joining regexes in Julia might be tricky (because of escaping characters); a solution is proposed here and seems to work just fine with minimal extra coding.


$\LaTeX$

Math typesetting with $\LaTeX$

  • For integrals to display the same size as fractions expanded with \dfrac, place a \displaystyle in front of the \int command.

Code typesetting with $\LaTeX$

  • For some reason minted blocks \begin{minted}...\end{minted} have problems to render in Beamer (something related to multilevel macros). I managed to insert code blocks with \inputminted as reported here.

  • Beamer have some issues with footnotes, especially when use column environments; a quick fix for this is through \footnotemark and \footnotetext[<number>]{<text>} as described here. Notice that \footnotemark automatically generates the counter for use as <number> in \footnotetext.

  • For setting a background watermark in Beamer one can use package background and display it using a Beamer template as described here.

MiKTeX

LaTeX Workshop


Python

Installing packages behind proxy

To install a package behind a proxy requiring SSL one can enforce trusted hosts to avoid certificate hand-shake and allow installation. This is done with the following options:

pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org <pkg>

Extracting text from PDF

Provides reference text exported from PDF files.

The engine uses a combination of tesseract and PyPDF2 to perform the data extraction. Nonetheless, human curation of extracted texts is still required if readability is a requirement. If quality of automated extractions is often poor for a specific language, you might want to search the web how to train tesseract, that topic is not covered here.

Besides Python you will need:

  • Tesseract (and a language pack) for extracting text from PDF.
  • ImageMagick for image conversion.
  • Poppler utils for PDF conversion

Install dependencies on Ubuntu 22.04:

sudo apt install  \
    tesseract-ocr \
    imagemagick   \
    poppler-utils

In case of Rocky Linux 9:

sudo dnf install           \
    tesseract              \
    tesseract-langpack-eng \
    ImageMagick            \
    poppler-utils

For Windows you will need to manually download both tesseract and poppler and place them somewhere in your computer. The full paths to these libraries and/or programs is provided by the optional arguments tesseract_cmd and poppler_path of Convert.pdf2txt.

Create a local environment, activate it, and install required packages:

python3 -m venv venv

source venv/bin/activate
    
pip install              \
    "pdf2image==1.17.0"  \
    "pillow==11.0.0"     \
    "PyPDF2==3.0.1"      \
    "pytesseract==0.3.13"

Now you can use the basic module pdf_convert provided here.