General computing skills
Git version control
Version control in Windows
- TortoiseGIT: for Windows users, this applications add the possibility of managing version control and other features directly from the file explorer.
Adding submodules
Generally speaking adding a submodule to a repository should be a simple matter of:
git submodule add https://<path>/<to>/<repository>
Nonetheless this might fail, especially for large sized repositories; I faced this issue which I tried to fix by increasing buffer size as reported in the link. This solved the issue but led me to another problem which could be solved by degrading HTTP protocol.
Working on Windows
Creating a portable launcher
A simple way to create a portable launcher requiring to source extra variables is by writing a simple batch script exporting or calling another script with the definitions:
@echo off
@REM Add variables to be sourced here such as
@REM set PATH="/path/to/some/dir";%PATH%
@REM ... or call another shared script doing so.
@REM call %~dp0\env
MyCode.exe
Because a batch script will keep a console window open, create a VB file with the following
Set oShell = CreateObject ("Wscript.Shell")
Dim strArgs
strArgs = "cmd /c MyCode.bat"
oShell.Run strArgs, 0, false
In the example we assume the program is called MyCode.exe
and the batch script has been named in an analogous way MyCode.bat
. Some real world examples are provided here.
Mount a network drive in WSL
Here we assume we will mount drive Z:
at /mnt/z
:
# Create the mount point (if required):
sudo mkdir /mnt/z
# Mount the network drive in WSL:
sudo mount -t drvfs Z: /mnt/z
Following writing to a file
This is equivalent to Linux tail -f <file-path>
:
Get-Content -Path "<file-path>" -Wait
Working on Linux
Several recent Linux distributions use Gnome 3 as the default desktop manager. A few innovations introduced by this environment are not really interesting and falling back to classical modes is useful:
- Add minimize/maximize buttons to the corner of windows
- Include a permanent configurable dock for applications
Regular expressions
Regular expressions (or simply regex) processing is a must-have skill for anyone doing scientific computing. Most programs produce results or logs in plain text and do not support specific data extraction from those. There regex becomes your best friend. Unfortunately during the years many flavors of regex appeared, each claiming to offer advantages or to be more formal than its predecessors. Due to this, learning regex is often language-specific (most of the time you create and process regex from your favorite language) and sometimes even package-specific. Needless to say, regex may be more difficult to master than assembly programming.
Useful web applications can be found in regex101 and regexr.
Match all characters between two strings with lookbehind and look ahead patterns. Notice that this will require the enclosing strings to be fixed (at least under PCRE). For processing
WallyTutor.jl
documentation I have used a more generic approach but less general than what is proposed here.Match any character across multiple lines with
(.|\n)*
.Currently joining regexes in Julia might be tricky (because of escaping characters); a solution is proposed here and seems to work just fine with minimal extra coding.
$\LaTeX$
Math typesetting with $\LaTeX$
- For integrals to display the same size as fractions expanded with
\dfrac
, place a\displaystyle
in front of the\int
command.
Code typesetting with $\LaTeX$
For some reason
minted
blocks\begin{minted}...\end{minted}
have problems to render in Beamer (something related to multilevel macros). I managed to insert code blocks with\inputminted
as reported here.Beamer have some issues with footnotes, especially when use
column
environments; a quick fix for this is through\footnotemark
and\footnotetext[<number>]{<text>}
as described here. Notice that\footnotemark
automatically generates the counter for use as<number>
in\footnotetext
.For setting a background watermark in Beamer one can use package
background
and display it using a Beamer template as described here.
MiKTeX
LaTeX Workshop
- Configuring builds in VS Code with LaTeX Workshop for building with
pdflatex
. Finally I ended creating my own workflows in this file.
Python
Installing packages behind proxy
To install a package behind a proxy requiring SSL one can enforce trusted hosts to avoid certificate hand-shake and allow installation. This is done with the following options:
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org <pkg>
Extracting text from PDF
Provides reference text exported from PDF files.
The engine uses a combination of tesseract and PyPDF2 to perform the data extraction. Nonetheless, human curation of extracted texts is still required if readability is a requirement. If quality of automated extractions is often poor for a specific language, you might want to search the web how to train tesseract, that topic is not covered here.
Besides Python you will need:
- Tesseract (and a language pack) for extracting text from PDF.
- ImageMagick for image conversion.
- Poppler utils for PDF conversion
Install dependencies on Ubuntu 22.04:
sudo apt install \
tesseract-ocr \
imagemagick \
poppler-utils
In case of Rocky Linux 9:
sudo dnf install \
tesseract \
tesseract-langpack-eng \
ImageMagick \
poppler-utils
For Windows you will need to manually download both tesseract
and poppler
and place them somewhere in your computer. The full paths to these libraries and/or programs is provided by the optional arguments tesseract_cmd
and poppler_path
of Convert.pdf2txt
.
Create a local environment, activate it, and install required packages:
python3 -m venv venv
source venv/bin/activate
pip install \
"pdf2image==1.17.0" \
"pillow==11.0.0" \
"PyPDF2==3.0.1" \
"pytesseract==0.3.13"
Now you can use the basic module pdf_convert
provided here.