Any advice for a Linux distribution for Python and data science applications?
#1
I would like to get a PinebookPro for data science tasks under linux with python and jupyter notebook.

Are there any troubles I have to expect with this setup? 

Does it matter that the PBP uses a ARM cpu?
Any libraries that will create trouble? 

Or can I just "PIP install" all the stuff I need without much trouble?

Does the Linux distribution matter? 

Thx
#2
That the PBP is ARM definitely matters.  As far as I know, there is no Anaconda for ARM, and Miniconda for ARM has limitations and does not seem well-supported.  I'm curious if there is a Jupyter kernel that will run on the PBP.

Even if you manage to get it running, it might be very slow.  So while I would not want to discourage anyone from getting a PBP, if the reason is "data science tasks under linux with python and jupyter notebook", then I would strongly consider an x86 machine.
#3
(09-27-2019, 01:39 PM)zaius Wrote: That the PBP is ARM definitely matters.  As far as I know, there is no Anaconda for ARM, and Miniconda for ARM has limitations and does not seem well-supported.  I'm curious if there is a Jupyter kernel that will run on the PBP.

Even if you manage to get it running, it might be very slow.  So while I would not want to discourage anyone from getting a PBP, if the reason is "data science tasks under linux with python and jupyter notebook", then I would strongly consider an x86 machine.

Oh yeah, I totally forgot that data science applications will be using x86-specific C/C++ under the hood instead of python. Your front end will work, but everything else might not. That said take a look at the raspberry pi scene for inspiration, I found berryconda (https://github.com/jjhelmus/berryconda) that way and there may be other ARM versions of popular tools.
#4
There is also no support for Intel MKL though often you can substitute that for OpenBLAS or ATLAS (depending on the application)

And the GPU has no linux support for OpenCL. So things are actually quite constrained.. but it's workable
#5
The pbp will also be quite constrained for any heavy lifting task or large datasets. I was planning to use my pbp for data science work too, but all the actual work will be done on a remote host.
#6
Thumbs Up 
(09-27-2019, 12:20 PM)hugepanic Wrote: I would like to get a PinebookPro for data science tasks under linux with python and jupyter notebook.

Are there any troubles I have to expect with this setup? 

Does it matter that the PBP uses a ARM cpu?
Any libraries that will create trouble? 

Or can I just "PIP install" all the stuff I need without much trouble?

Does the Linux distribution matter? 

Thx

I remember reading this post a week or so back and thinking "that sounds like a fun challenge". When I used to use Ubuntu, I'd manage my Python installation via pip. However, I was never particularly good at it, and used to find using pip to manage compatible versions of Python packages a pain in the arse. Particularly for Spyder, which seems to constantly break depending on the versions of pyqt5 and qtwebengine present. Having switched to Manjaro a while back, I've pretty much abandoned pip in favour of the package versions available from pacman (as these tend to be the latest versions anyway).

I initially tried to replicate my Python setup in the PBP Debian Stretch build, but the version of Python in Debian's repos was quite old. I managed to build the latest version of Python from source on the PBP, but failed to navigate the pip dependency maze required to get Spyder running.

I've since switched to the current Manjaro ARM preview and found setting up Python to be as easy as it is on my desktop machine, i.e. for the stuff I use:

Code:
# Update Packages
sudo pacman -Syyu
sudo pacman -S yay base-devel # Access to AUR packages (if yay not already installed)
yay -Sua

# Install data science packages
sudo pacman -S spyder # IDE
sudo pacman -S python-scipy python-numpy python-pandas python-matplotlb python-seaborn python-statsmodels # Data science
sudo pacman -S python-beautifulsoup4 python-mechanize # Web-scraping
yay -S python-lifelines # Data science (survival analysis)

# Up available RAM via zswap
sudo pacman -S zswap-arm
sudo systemctl enable zswap-arm
sudo systemctl start zswap-arm

I've not tried Jupyter, but I note that it's present in Manjaro's repos.

Manjaro KDE desktop + Spyder = 1.0GB RAM usage, leaving you with about 2.8GB for data science stuff. Should be enough for anything other than heavy lifting, as noted by one of the other posts in this thread.

NB: Please ignore my amateurish Python (attached picture). I'm currently teaching myself by porting across some of my old R teaching materials.


Attached Files
.png   Spyder.png (Size: 264.59 KB / Downloads: 434)
#7
I would love to see a striped down version of the pinebook pro available for students who are interested in learning about: Linux, python 3.6 or greater, and command line etc.
#8
(09-27-2019, 12:20 PM)hugepanic Wrote: I would like to get a PinebookPro for data science tasks under linux with python and jupyter notebook.

Are there any troubles I have to expect with this setup? 

Does it matter that the PBP uses a ARM cpu?
Any libraries that will create trouble? 

Or can I just "PIP install" all the stuff I need without much trouble?

Does the Linux distribution matter? 

Thx

I'm running Manjaro's image on my eMMC with Jupyter, Pandas, Sympy, Plotly, Numpy, Scikit-learn, plotly, matplotlib, statsmodels, tqdm, boto3, pyserial, pytz, flask (testing APIs for data transformations), and other libraries installed on Python 3.8.  My only trouble right now is pandoc doesn't install so I can't export my notebooks to PDF for non-jupyter people.

For quickly working with small datasets, it's fine, but I use Google Collab for higher performance computing (it's free, has GPUs/CUDA) from my laptop, and eventually will setup a ssh tunnel to my desktop for bigger jobs.

I work mostly with time series sensor data and lab data from controlled experiments and environmental monitoring applications.  Nothing below 1 sample per 10 seconds (10 minute average datapoint interval) on hours to months worth of data -- so I can use the PBP as a daily driver, as it is not much slower than my MacBook Pro for these tasks (5x as long on sub 1 second data transformations isn't terribly noticeable). 

Memory is a problem however, as Chromium is the best performing browser, which eats away at your 4GB pretty quick.  I run Arglebargle's zram-swap service, which helps.
#9
i am surprised by how little memory is a problem, actually.

under load I am using 2GB of RAM and have never seen swap used.

the performance bottleneck seems to be hardware acceleration and/or browser CPU usage.

admittedly, this is info from debian sid arm64 + sway git + mesa-git, but it is probably close to other experiences.
#10
For my setup, I have this working well:

1. Python 3.8 - default install that comes with Manjaro
2. pip - sudo pacman -Syu python-pip
3. Pandas - installed via sudo pacman -Syu python-pandas
4. Numpy - comes installed with Pandas as requirement
5. Virtualenv - sudo pacman -Syu python-virtualenv

Then I create a virtual environment with flag --system-site-packages so that I have access to pandas and numpy. installed in the global Python installation dir.
Within that virtual env, I pip install jupyter lab and other pypi packages that I need for API calls etc. Works  great so far for light, initial data exploration (I haven't taken it further -- large datasets in the 1M records mark)


[Image: hGb9xWq.png]


Possibly Related Threads…
Thread Author Replies Views Last Post
  Attempting to install Void Linux, boots into a black screen 9a3eedi 0 203 02-18-2024, 08:54 AM
Last Post: 9a3eedi
  Would a Pinebook Pro be good for a Linux newbie? cassado10 6 1,321 08-08-2023, 04:58 AM
Last Post: moobythegoldensock
  Kali Linux for Pinebook Pro - stuck on the login screen owaspfap 0 606 07-13-2023, 05:21 PM
Last Post: owaspfap
  Kali Linux for Pinebook Pro Luke 100 156,142 05-03-2023, 06:10 AM
Last Post: dachalife
  With the help of a friend, I installed a beautiful deep os distribution, but I won't wangyukunshan 0 562 03-03-2023, 10:56 PM
Last Post: wangyukunshan
  DiY - Installing Void Linux ARM On The Pinebook Pro vincele 1 1,124 11-28-2022, 05:03 PM
Last Post: tllim
  Arch Linux ARM root filesystem SKiljan 24 20,115 09-24-2022, 03:11 AM
Last Post: alexandre
Information Install Void Linux with near-full-disk encryption dumetrulo 3 2,945 09-05-2022, 12:00 PM
Last Post: petersen77
  Help with Kali Linux wifi not working but works with Manjaro PineSupporter 2 2,283 08-25-2022, 02:15 PM
Last Post: Niko
  Problems installing/ booting a different Linux on PinebookPro v-man 3 2,961 03-22-2022, 06:37 PM
Last Post: pentamassiv

Forum Jump:


Users browsing this thread: 1 Guest(s)