Any advice for a Linux distribution for Python and data science applications? - Printable Version +- PINE64 (https://forum.pine64.org) +-- Forum: Pinebook Pro (https://forum.pine64.org/forumdisplay.php?fid=111) +--- Forum: Linux on Pinebook Pro (https://forum.pine64.org/forumdisplay.php?fid=114) +--- Thread: Any advice for a Linux distribution for Python and data science applications? (/showthread.php?tid=8007) |
Any advice for a Linux distribution for Python and data science applications? - hugepanic - 09-27-2019 I would like to get a PinebookPro for data science tasks under linux with python and jupyter notebook. Are there any troubles I have to expect with this setup? Does it matter that the PBP uses a ARM cpu? Any libraries that will create trouble? Or can I just "PIP install" all the stuff I need without much trouble? Does the Linux distribution matter? Thx RE: Any advice for a Linux distribution for Python and data science applications? - zaius - 09-27-2019 That the PBP is ARM definitely matters. As far as I know, there is no Anaconda for ARM, and Miniconda for ARM has limitations and does not seem well-supported. I'm curious if there is a Jupyter kernel that will run on the PBP. Even if you manage to get it running, it might be very slow. So while I would not want to discourage anyone from getting a PBP, if the reason is "data science tasks under linux with python and jupyter notebook", then I would strongly consider an x86 machine. RE: Any advice for a Linux distribution for Python and data science applications? - CampGareth - 09-27-2019 (09-27-2019, 01:39 PM)zaius Wrote: That the PBP is ARM definitely matters. As far as I know, there is no Anaconda for ARM, and Miniconda for ARM has limitations and does not seem well-supported. I'm curious if there is a Jupyter kernel that will run on the PBP. Oh yeah, I totally forgot that data science applications will be using x86-specific C/C++ under the hood instead of python. Your front end will work, but everything else might not. That said take a look at the raspberry pi scene for inspiration, I found berryconda (https://github.com/jjhelmus/berryconda) that way and there may be other ARM versions of popular tools. RE: Any advice for a Linux distribution for Python and data science applications? - geokon - 10-04-2019 There is also no support for Intel MKL though often you can substitute that for OpenBLAS or ATLAS (depending on the application) And the GPU has no linux support for OpenCL. So things are actually quite constrained.. but it's workable RE: Any advice for a Linux distribution for Python and data science applications? - mmt - 10-30-2019 The pbp will also be quite constrained for any heavy lifting task or large datasets. I was planning to use my pbp for data science work too, but all the actual work will be done on a remote host. RE: Any advice for a Linux distribution for Python and data science applications? - User 11436 - 11-17-2019 (09-27-2019, 12:20 PM)hugepanic Wrote: I would like to get a PinebookPro for data science tasks under linux with python and jupyter notebook. I remember reading this post a week or so back and thinking "that sounds like a fun challenge". When I used to use Ubuntu, I'd manage my Python installation via pip. However, I was never particularly good at it, and used to find using pip to manage compatible versions of Python packages a pain in the arse. Particularly for Spyder, which seems to constantly break depending on the versions of pyqt5 and qtwebengine present. Having switched to Manjaro a while back, I've pretty much abandoned pip in favour of the package versions available from pacman (as these tend to be the latest versions anyway). I initially tried to replicate my Python setup in the PBP Debian Stretch build, but the version of Python in Debian's repos was quite old. I managed to build the latest version of Python from source on the PBP, but failed to navigate the pip dependency maze required to get Spyder running. I've since switched to the current Manjaro ARM preview and found setting up Python to be as easy as it is on my desktop machine, i.e. for the stuff I use: Code: # Update Packages I've not tried Jupyter, but I note that it's present in Manjaro's repos. Manjaro KDE desktop + Spyder = 1.0GB RAM usage, leaving you with about 2.8GB for data science stuff. Should be enough for anything other than heavy lifting, as noted by one of the other posts in this thread. NB: Please ignore my amateurish Python (attached picture). I'm currently teaching myself by porting across some of my old R teaching materials. RE: Any advice for a Linux distribution for Python and data science applications? - Norm - 01-21-2020 I would love to see a striped down version of the pinebook pro available for students who are interested in learning about: Linux, python 3.6 or greater, and command line etc. RE: Any advice for a Linux distribution for Python and data science applications? - evantaylor - 01-21-2020 (09-27-2019, 12:20 PM)hugepanic Wrote: I would like to get a PinebookPro for data science tasks under linux with python and jupyter notebook. I'm running Manjaro's image on my eMMC with Jupyter, Pandas, Sympy, Plotly, Numpy, Scikit-learn, plotly, matplotlib, statsmodels, tqdm, boto3, pyserial, pytz, flask (testing APIs for data transformations), and other libraries installed on Python 3.8. My only trouble right now is pandoc doesn't install so I can't export my notebooks to PDF for non-jupyter people. For quickly working with small datasets, it's fine, but I use Google Collab for higher performance computing (it's free, has GPUs/CUDA) from my laptop, and eventually will setup a ssh tunnel to my desktop for bigger jobs. I work mostly with time series sensor data and lab data from controlled experiments and environmental monitoring applications. Nothing below 1 sample per 10 seconds (10 minute average datapoint interval) on hours to months worth of data -- so I can use the PBP as a daily driver, as it is not much slower than my MacBook Pro for these tasks (5x as long on sub 1 second data transformations isn't terribly noticeable). Memory is a problem however, as Chromium is the best performing browser, which eats away at your 4GB pretty quick. I run Arglebargle's zram-swap service, which helps. RE: Any advice for a Linux distribution for Python and data science applications? - xmixahlx - 01-22-2020 i am surprised by how little memory is a problem, actually. under load I am using 2GB of RAM and have never seen swap used. the performance bottleneck seems to be hardware acceleration and/or browser CPU usage. admittedly, this is info from debian sid arm64 + sway git + mesa-git, but it is probably close to other experiences. RE: Any advice for a Linux distribution for Python and data science applications? - lemaurien19 - 06-21-2020 For my setup, I have this working well: 1. Python 3.8 - default install that comes with Manjaro 2. pip - sudo pacman -Syu python-pip 3. Pandas - installed via sudo pacman -Syu python-pandas 4. Numpy - comes installed with Pandas as requirement 5. Virtualenv - sudo pacman -Syu python-virtualenv Then I create a virtual environment with flag --system-site-packages so that I have access to pandas and numpy. installed in the global Python installation dir. Within that virtual env, I pip install jupyter lab and other pypi packages that I need for API calls etc. Works great so far for light, initial data exploration (I haven't taken it further -- large datasets in the 1M records mark) |