Siddhesh's Tech Bytes

Learning CSS with games

Siddhesh Agarwal — Sat, 24 Jan 2026 12:54:57 GMT

Some topics in CSS are more easier to understand by experience then they are by reading docs/blogs. Here are some games that helped me understand these topics better:

CSS

CSS Challenges - https://css-challenges.com/
CSS Speedrun - https://css-speedrun.netlify.app/

Selectors

CSS Diner - https://flukeout.github.io/

Flexbox

Flexbox Froggy - https://flexboxfroggy.com/
Flexbox Defense - http://www.flexboxdefense.com/
Flexbox Zombies - https://mastery.games/flexboxzombies/
Flexbox Adventure - https://codingfantasy.com/games/flexboxadventure

Grid

Grid Garden - https://cssgridgarden.com/
Grid Critters - https://mastery.games/gridcritters/
Grid Attack - https://codingfantasy.com/games/css-grid-attack

Anchor

Anchoreum - https://anchoreum.com/

Testing yourself

Here are some websites where you can test your skills against other people around the world or your friends:

CSS Battle - https://cssbattle.dev/
Coding Game - https://www.codingame.com/start/

Copyrights and LLMs

Siddhesh Agarwal — Sun, 30 Mar 2025 14:47:59 GMT

Large Language Models (LLMs) like OpenAI’s GPT, Google’s Gemini, Meta’s LLaMA, and Deepseek’s R1 have transformed AI by creating human-like text, code, and creative content. However, their quick progress has raised serious concerns about copyright infringement. Many of these models are trained on large amounts of copyrighted material—such as books, articles, research papers, and even proprietary media—without explicit permission from authors or publishers.

This ethical and legal dilemma forces us to ask: Should AI companies be allowed to use copyrighted data without compensation? And if they do, shouldn’t they be required to either:

Open-source their models (to ensure transparency and public benefit), or
Pay for the rights to the copyrighted material they use?

The Problem: LLMs Are Built on Copyrighted Works

Most leading LLMs are trained on datasets scraped from the internet, including:

Books (fiction, non-fiction, academic)
News articles and journalistic content
Research papers and technical documentation
Proprietary code from platforms like GitHub

Many of these sources are protected under copyright law. However, AI companies claim their use qualifies as "fair use"—a legal rule that permits limited use of copyrighted material for things like research, education, or commentary. This argument becomes less convincing when AI-generated content competes with the original works it was trained on, such as AI-written books replacing those by human authors.

Why Should AI Companies Open-Source Their Models?

If AI firms refuse to pay for copyrighted training data, they should at least open-source their models to:

Ensure Transparency: Users and regulators can audit the training data and model behavior.
Prevent Monopolization: Closed-source LLMs give tech giants an unfair advantage, stifling competition.
Enable Public Benefit: Open models allow researchers, startups, and nonprofits to innovate without corporate restrictions.

Meta’s LLaMA and DeepSeek’s models are steps in this direction, but many leading AI systems remain proprietary.

If Not Open-Source, AI Firms Must Pay for Rights

If companies insist on keeping their models closed, they should negotiate licenses with copyright holders. Some possible approaches:

Direct Licensing Deals (e.g., OpenAI partnering with publishers like Axel Springer)
Royalty Systems (compensating authors per AI-generated output)
Opt-out mechanisms (letting creators exclude their work from training datasets)

The New York Times lawsuit against OpenAI and the Indian Media v/s OpenAI Case raises a question:

if AI models reproduce paywalled content verbatim, should they be liable for copyright violations?

Conclusion: A Fair Approach to AI and Copyright

The current practice of scraping copyrighted works without permission is unsustainable. AI companies must choose:

Open-source their models to democratize AI and avoid legal risks, or
Pay for licensed data, ensuring creators are fairly compensated.

Without reform, the AI industry risks legal battles, public backlash, and an erosion of trust. The future of AI should be built on ethical data use, not the unchecked exploitation of copyrighted material.

The Pointlessness of FP vs OOP Discussions

Siddhesh Agarwal — Sun, 05 Jan 2025 07:24:41 GMT

In the world of software engineering, discussions about different programming styles are pretty common. The clash between Object-Oriented Programming (OOP) and Functional Programming (FP) often takes centre stage. Each camp claims its approach is the best, pointing to scalability, maintainability, and user-friendliness as key factors. But honestly, this debate tends to be more distracting than anything productive.

The real issue in programming isn’t about sticking to a specific style; it’s about understanding code as Data Flow. Grasping how data moves and gets processed in your system is way more valuable than arguing over whether a class or a reduce function is the way to go.

Average FP vs OOP Debate

This video perfectly sums up the average Functional Programming or object-oriented programming Debate

https://youtu.be/lRX5b6SiR3o

Paradigms: Tools, Not Identities

Before we get into data flow, let’s take a moment to clarify our frameworks. Both Object-Oriented Programming (OOP) and Functional Programming (FP) are mental models that help us make sense of code. Think of them as tools rather than strict rules to follow.

OOP focuses on encapsulation and represents the world as a bunch of interacting entities (or objects). This approach works well in areas like UI design or game development.
FP, on the other hand, highlights immutability and composition, which often leads to clean and predictable transformations. This method is particularly useful in data-heavy fields like data pipelines or distributed systems.

The key takeaway is that these paradigms excel in certain situations but can fall short if applied too rigidly. Sticking to just one paradigm can distract us from the more important question: How does data flow through our system?

The Essence of Programming: Data Flow

Programming fundamentally involves three components: inputs, processes, and outputs. For instance, processing user data from a database to display on a webpage illustrates this concept. Regardless of whether you utilize Object-Oriented Programming (OOP) or Functional Programming (FP), the essential steps remain the same:

Input: Fetch data from the database.
Processes: Filter, sort, or format the data.
Output: Render the processed data through HTML.

Concentrating on how data moves and evolves within the system encourages you to think beyond mere syntax and boilerplate.

Understand dependencies.
Minimize unnecessary transformations.
Optimize for performance and clarity.

This perspective naturally integrates the strengths of both OOP and FP while avoiding their pitfalls.

Data Flow in Practice

Let’s consider a practical scenario involving the development of a recommendation system.

OOP Approach

You might create classes like User, Product, and RecommendationEngine. Each class has methods encapsulating behaviour, such as getRecommendations(). While this can work, it often leads to tight coupling and hard-to-follow chains of method calls, especially in systems with complex logic.

FP Approach

Alternatively, you can view the data as unchangeable structures and use functions to manage it. This approach can create clean and efficient pipelines. However, if you rely too heavily on functional programming, it may become confusing and require significant mental effort to keep track of what's happening in between.

Data Flow Approach

Instead of starting with paradigm constraints, begin by modelling the flow of data:

Data Collection: Aggregate user interactions and product metadata.
Processing: Utilize algorithms such as collaborative filtering or content-based filtering.
Delivery: Serialize the results and send them to the client.

Tools and abstractions, including classes, functions, and other types, should primarily facilitate this flow. By concentrating on the sequence of transformations, you can:

Simplify complex systems by dividing them into distinct, testable components.
Select the appropriate paradigm or library for each step.
Steer clear of early optimization and avoid making things more complicated than necessary.

Moving Beyond Paradigm Wars

The main point is simple: paradigms are not solutions; they are ways to understand problems. Sticking to one paradigm as the "right" way limits creativity and hides the real goal of programming, which is to solve problems effectively. By shifting focus to data flow:

Clarity Improves: Your understanding of the problem should align with its true nature rather than being influenced by the peculiarities of established paradigms.
Flexibility Increases: You can take a hybrid approach by incorporating the best ideas from various paradigms.
Efficiency Gains: Enhancing data movement and transformation usually results in improved performance over sticking to arbitrary design principles.

Conclusion

The debate between Object-Oriented Programming (OOP) and Functional Programming (FP) highlights a basic problem: we often confuse tools with solutions. By seeing code as the flow of data, we can look past this simple argument and concentrate on what matters. Instead of getting stuck in debates about paradigms, think about how data moves and how we can make that movement simpler and clearer. That's where real progress happens.

TL;DR

The ongoing debate between Object-Oriented Programming (OOP) and Functional Programming (FP) is pointless. It doesn't matter if you use FP or OOP as long as you solve the problem. Developers should focus on the main idea of data flow, which is how data moves, changes, and adds value to a system. By focusing on good data movement and transformation, we can make our projects clearer, more adaptable, and more efficient. Thinking about data flow helps programmers handle challenges more effectively.

RAG Model using Langchain.py and ChromaDB

Siddhesh Agarwal — Sun, 28 Apr 2024 03:09:47 GMT

Today, I will discuss creating a Retrieval Augmented Generation (RAG) Model on your custom data using Python, LangChain, and ChromaDB (or any VectorStore you choose).

You can find the source code here: Siddhesh-Agarwal/django-rag (github.com )

What is a RAG Model?

To put it in simple words, a RAG Model is a Large Language Model that is connected to a "Retrieval Agent". A retrieval agent is an agent that fetches documents from a "VectorStore" using the Approximate Nearest Neighbor (ANN) algorithm. A VectorStore can be thought to be similar to a Knowledge Base. VectorStore or Vector Database is a special type of Database that stores information as high-order vectors. These vectors are based on the "text embeddings" of the information. Since LLMs do not understand "Language", sentences are tokenized (essentially, tokenization is the process of breaking down sentences into smaller units called "tokens". A token may or may not be equivalent to a word) and converted to their vector equivalent. Each LLM has a unique text embedding.

What is the use of it?

There are 2 major problems while using Large Language Models:

Training Date Cutoff: LLMs do not have information after their training cutoff date, so they cannot answer questions about the latest events.
Hallucinations: LLMs tend to "imagine" certain information to answer questions. This may be because the relevant information is missing or is getting lost due to the context size of the model. This results in the model spitting out factually Inconsistent information.

We can solve both of these problems using RAG Models.

READ MORE: [2311.13878] Minimizing Factual Inconsistency and Hallucination in Large Language Models (arxiv.org )

We can create custom Chatbots that answer questions based on our custom data. For example, a car manufacturer could make a chatbot for its website that answers basic questions!

Building a RAG Model

We can build our RAG Model in 4 simple steps.

Step 1: Selecting LLM and VectorStore

While this step is the simplest, it is also the most crucial in the process. You can select any VectorStore and any LLMs. For simplicity, let's choose ChromaDB and the GPT 3.5-Turbo Model (requires an OpenAI API key).

Step 2: Data Extraction and Document Loading

First, we need to extract our data. It could be in the form of a text file, a PDF, or even a webpage. To extract the data, we should use Langchain's document loaders. For instance, if we are developing a Chatbot trained on the Django documentation, we should start by using a document loader to fetch the documents from the website:

from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader
from bs4 import BeautifulSoup

loader = RecursiveUrlLoader(
    url="https://django.readthedocs.io/en/stable/",
    max_depth=3,
    extractor=lambda x: BeautifulSoup(x).text,
)

Now that we have created a loader, we load the documents:

docs = loader.load()

While this gets the job done, there is an issue with loader.load(). The problem is that it produces documents with inconsistent sizes (one page might be too large while another might be too small, causing variations in document size) and there is no set maximum size for a document (this could lead to the LLM missing context even if the right document is recognized due to large context size). This problem can be resolved by utilizing the load_and_split() method of the loader:

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
docs = loader.load_and_split(text_splitter=text_splitter)

The idea behind a "text splitter" is to split the documents into small uniformly sized documents.

Step 3: Creating Embeddings and Storage

Now we have to add the documents to a Vector Store. But first, we need to initialize the VectorStore:

from langchain.vectorstores.chroma import Chroma
from langchain_openai.embeddings import OpenAIEmbeddings
import chromadb

client = chromadb.PersistentClient()
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
db = Chroma(
    client=client,
    collection_name="django",
    embedding_function=embeddings,
)

Now, we add the documents to the store:

db.add_documents(docs)

You will face the RateLimitError if you have enough data and are working on the free tier/tier-1. To prevent the rate limiting, we can add a small amount of documents after small intervals:

import time

# change these variables to experiment
batch_size = 10
sleep_time = 1
for i in range(0, len(docs), batch_size):
    db.add_documents(docs[i : i + batch_size])
    time.sleep(sleep_time)

After this step is done, you have successfully embedded all the data in a VectorStore

Step 4: Connecting Retriever to LLM

Now comes the nice part, we create a retriever instance, an LLM instance and a Prompt template and join all 3 to form an LLM chain:

from langchain.prompts.chat import ChatPromptTemplate
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai.llms import OpenAI

retriever = db.as_retriever()
model = OpenAI(temperature=0)
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

query = "What are the features of Django?"
chain.stream(query)

Congrats, you have made your very own RAG Model that answers questions from the Django documentation!!!

Deployment

The RAG Model can be deployed using Streamlit.

To learn more about streamlit, check out my previous article on Streamlit for ML deployment

Deploy ML Models with Streamlit

Siddhesh Agarwal — Wed, 24 Apr 2024 03:53:16 GMT

Streamlit is an open-source Python library that can make and deploy beautiful-looking web apps in a few minutes. It has been quite a handy tool for deploying ML models without creating API endpoints in Flask or FastAPI.

Today, I will talk about some streamlit functions I have used that could help you too! For Demo, this is what a streamlit app looks like:

Source: Siddhesh-Agarwal/Skin-Cancer-Detection: A web app to detect Skin cancer using pictures of moles and other marks on skin (github.com )

Creating

To begin, create a app.py file or use a template. In your app.py file simply type:

import streamlit as st

st.title("Test")

Magic

Anything written in the Python script between triple quotes (""")will be rendered as markdown text!

"""
# Heading 1
## Heading 2
### Heading 3

1. Ordered list 1
2. Ordered list 2
3. Ordered list 3
"""

Components

Write

Described as the "Swiss Army knife of Streamlit commands", `st.write()` is one of the most versatile functions to display any data type.

st.write("_Hello_ *World*")
st.write(df)

Data Elements

JSON
```
 st.json(dict_or_json_obj)
```
DataFrame
```
 st.dataframe(df)
```
Table
```
 st.table(df)
```

Metrics

 st.metric("Temperature", "39°C", "1°C")
 st.metric("Wind", "12 kmph", "-8%")

Input Widgets

Text Input

 name = st.text_input("Enter you name")
 password = st.text_input("Enter the password", type="password")
 st.write(f"Your name is {name} and your password is {password}")

Number Input

 age = st.number_input("Enter your age", min_value=18, max_value=120)
 st.write("Your Age is:", age)

Slider

 age = st.slider("Enter your age", min_value=18, max_value=120)
 st.write("Your Age is:", age)

Date Input

 from datetime import date

 bday = st.date_input("When's your birthday", date(2019, 7, 6))
 st.write('Your birthday is:', bday)

Button

 if st.button("Click me"):
     st.write("Well Done")

Radio

 food_items = ["Pizza 🍕", "Burger 🍔", "Spaghetti 🍝"]
 food = st.radio("What do you want to eat?", food_items)
 if food:
     st.write(f"You are eating {food}!")

Select Box

 food_items = ["Pizza 🍕", "Burger 🍔", "Spaghetti 🍝"]
 food = st.selectbox("What do you want to eat?", food_items)
 if food:
     st.write(f"You are eating {food}!")

Multiselect

 food_items = ["Pizza 🍕", "Burger 🍔", "Spaghetti 🍝"]
 food = st.multiselect("What do you want to eat?", food_items)
 if food:
     st.markdown(f"You are eating {', '.join(food)}!")

Toggle

 toggle = st.toggle("Display Dataframe")
 if toggle:
     st.write(df)

File Uploader

file = st.file_uploader("Upload a file")

Layouts

Columns

 cols = st.columns(2)
 with cols[0]:
     st.write("This is Column 1")
 with cols[1]:
     st.write("This is Column 2")

Container

 c = st.container()
 st.write("Line 3")
 c.write("Line 1")
 c.write("Line 2")

Expander

 with st.expander("Click to Open"):
     st.write("Here is some more content")

Sidebar

 with st.sidebar:
     st.write("This will be displayed in the sidebar")

Tabs

 tabs = st.tabs(["Milk", "Cookies"])
 with tabs[0]:
     st.write("You chose Milk 🥛")
 with tabs[1]:
     st.write("You chose Cookies 🍪")

Status Elements

Spinner

 with st.spinner("Loading..."):
     perform_task()

Toast

 st.toast("Process completed successfully!")

Balloons / Snow
```
 st.balloons()
 st.snow()
```

status boxes

 st.success("Success")
 st.info("Information")
 st.warning("Warning")
 st.error("Error")

Control Flow

Forms

 with st.form(key='some_form'):
     name = st.text_input("Name")
     age = st.number_input("Age")
     if st.form_submit_button("Submit"):
         st.balloons()

Rerun
```
 st.rerun()
```
Stop Execution
```
 st.stop()
```

Configuration

Streamlit allows the configuration of colour along with some other things. To customize, create a .streamlit/config.toml file. You can do something like this to change the classic black-and-white style to something more colourful:

[theme]
primaryColor="#F63366"
backgroundColor="#FCF2E5"
secondaryBackgroundColor="#F8E4C7"
textColor="#302730"

Learn More: config.toml - Streamlit Docs

Secrets

To manage secrets (like environment variables and API Keys), Streamlit has a custom solution. A special file (./.streamlit/secrets.toml) keeps all the secrets.

OPENAI_API_KEY=""

Deployment

The first step is to identify the input taken by the model and input those details.

 pic = st.file_uploader(
     label="Upload a picture",
     type=["png", "jpg", "jpeg"],
     accept_multiple_files=False,
     help="Upload a picture of a cat or dog",
 )

The second step is to preprocess the image to fit the model (this preprocessing depends on your model.

The next step is to load your model (TIP: cache the model to prevent loading time for subsequent runs)

 @st.cache_resource
 def load_model():
     model = tf.keras.models.load_model("./model/model.h5")
     return model

Pass the processed image to the model and display the output.
```
 model = load_model()
 st.write(model.predict(img))
```

Execution

To run the file, run:

$ streamlit run app.py

NOTE: You can learn about the different components in streamlit through their official docs.

Publishing Packages using Poetry

Siddhesh Agarwal — Sat, 03 Feb 2024 08:40:05 GMT

Poetry is rapidly gaining recognition as an excellent dependency manager in the Python community. It has risen rapidly to become the dependency manager for various projects across the Python community. But did you know that not only can poetry be used as a dependency manager but it can also be used for publishing Python packages to PyPI? I have previously written about publishing to PyPI using the tedious setup.py-bdist-twine method

https://blog.siddhesh.tech/packaging-and-publishing-on-pypi

Initiating a project

To start a new project, run the following command:

poetry new poetry-demo

This will create a file poetry-demo with the following file structure:

poetry-demo
├── pyproject.toml
├── README.md
├── poetry_demo
│   └── __init__.py
└── tests
    └── __init__.py

NOTE: Check the availability of the name on PyPI

Adding Dependencies

To add dependencies to the project, we run the command:

poetry add

For example, to add pandas as our dependency, we run the command:

poetry add pandas

If you are adding a dependency for the first time in a project, you will notice a new poetry.lock file being generated. It contains dependencies along with their corresponding versions and checksums/hashes. This helps users to always be on the same depency versions. It also results in faster dependency conflict resolution.

Read more about poetry

Poetry also allows us to create groups. We can do them something like this:

poetry add black isort --group dev

The installation command will now be:

pip install "poetry-demo[dev]"

Tweaking `poetry.toml`

The poetry.toml file contains metadata similar to the setup.py file. We can enter various things like name, version, description, license, authors, readme, repository, keywords and classifiers among other things.

Read more about the poetry.toml file

Publishing on PyPI

Before publishing our project on PyPI, We need to create a build:

poetry build

This generates a dist folder that contains a targ.gz and a whl file. Poetry works only for vanilla Python builds (no C/C++/Rust builds 😞). Now, to publish these files to PyPI, we run the command:

poetry publish

You will be prompted for your PyPI username and password. Enter them to upload the packages. Instead of running 2 sperate commands, uou could combine the 2 commands and run this instead:

poetry publish --build

And congrats, you published a library using poetry!

Encryption and Emojis!

Siddhesh Agarwal — Tue, 15 Aug 2023 16:48:40 GMT

Almost nine months ago, I published my 3rd Python library - Cryptmoji. You may have come across a ton of cryptography libraries on PyPI. Many may be relatively safer, but I aimed to use "Caesar Cipher" and "mapping" to make a fun tool to learn cryptography. To install Cryptmoji, run the following:

pip install cryptmoji

How to use Cryptmoji

Encryption

To encrypt text, we use the encrypt() function.

>>> from cryptmoji import encrypt
>>> text = "H3LL0 W0RLD"
>>> encrypted = encrypt(text)
'🌾🌜🍂🍂🌙🌉🍍🌙🍈🍂🌺'

The output is much different than the regular encryption algorithms, isn't it?

Decryption

To decrypt text, we use the decrypt() function.

>>> from cryptmoji import decrypt
>>> encrypted = "🌾🌜🍂🍂🌙🌉🍍🌙🍈🍂🌺"
>>> decrypt(encrypted)
'H3LL0 W0RLD'

And Voila! We have our string back!

using the `Key` parameter

As you can see, the output starts to become predictable after some time. So we need another parameter to shuffle the characters. This Parameter is key. This will drastically change the encrypted string.

For example:

>>> from cryptmoji import encrypt, decrypt
>>> key = "HI_M0M"
>>> encrypted = encrypt("H3LL0 W0RLD", key=key)
>>> encrypted
'🎇🍲🎮🎐🍖🍣🎢🍯🎴🎐🍪'
>>> decrypt(encrypted, key=key)
'H3LL0 W0RLD'

For more details, refer to the documentation and GitHub repo.

Formatter for Jupyter notebooks

Siddhesh Agarwal — Sat, 18 Feb 2023 06:14:56 GMT

Introduction

If you are into Data Science or Machine Learning, you have probably come across jupyter notebooks (.ipynb files). The problem I faced when using jupyter notebooks was that the black formatter didn't work on them. I had tried using the

$ black notebook.ipynb

command many times. This article is meant to help with code formatting in Python Notebooks.

nbQA

So, we will be using a python library called nbQA along with code formatters like Black and isort.

Installation

Install the library using:

$ pip install nbqa

Usage

You can use various formatters along with nqba and I will demonstrate how to use a few of them. before trying the formatters, make sure you have installed them already.

black

Format the notebook using black as shown below:

$ nbqa black notebook.ipynb
reformatted notebook.ipynb
All done! ✨ 🍰 ✨
1 files reformatted.

isort

Similarly, format the notebook using isort:

$ nbqa isort notebook.ipynb
Fixing notebook.ipynb

yapf

$ nbqa yapf --in-place notebook.ipynb

autopep8

$ nbqa autopep8 -i notebook.ipynb

mdformat

To format the markdown cells in your notebook, use:

$ nbqa mdformat notebook.ipynb --nbqa-md --nbqa-diff

doctest

To run tests for iPython notebooks using doctypes:

$ nbqa doctest notebook.ipynb

I hope you liked it. That's all for this time.

Using if name == 'main'

Siddhesh Agarwal — Mon, 05 Sep 2022 17:16:46 GMT

Many of us have seen the:

import something

def function():
    pass

if __name__ == '__main__':
    # function calls here...
    pass

in one documentation or the other, but what is it?

Short Answer

It protects users from accidentally invoking the script when they didn't intend to. It also helps the user identify whether the file has to be executed or not. The lack of it will make identification harder for the user.

Long Answer

In a large process, there are 2 types of files:

Files that contain function definitions.
Files that call those functions and display output.

Files of type 1 do not need to be executed since they won't display output and simply define the functions and classes and constants used.

On the other hand, files of type 2 need to be executed since they display and output. To help the user identify that this file has to be execute, one uses if __name__ == '__main__'

When the interpreter runs a module (that is the main program), the __name__ variable will have the value __main__.

print(f"__name__ = {__name__}")    # __name__ = '__main__'

But If the code is importing from another module, then the __name__ variable will be set to that module’s name.

import something

print(f"__name__ = {__name__}")    # __name__ = 'something'

Conclusion

We can use an if __name__ == "__main__" block to allow or prevent parts of code from being run when the modules are imported.

Packaging and Publishing on PyPI

Siddhesh Agarwal — Tue, 19 Apr 2022 03:17:07 GMT

This post is on "Packaging and Publishing a python library on PyPI".

There is a tutorial on Packaging Python Projects in the official PyPI website, but these docs are... outdated.

What is PyPI?

As the website says:

The Python Package Index (PyPI) is a repository of software for the Python programming language.

This means that PyPI is the official third-party software repository for Python. Anyone (with a PyPI account) can publish a Python Package for the use of the other developers. PyPI hosts these Python packages in the form of sdists (source distributions) or precompiled "wheels".

File structure

You have to maintain a certain file structure before packaging your project.

package_name/
├── src/
|    ├── __init__.py
|    ├── example.py
|    └── py.typed
├── LICENSE
├── pyproject.toml
├── README.md
├── setup.py (or) setup.cfg
└── tests/

So here is what each file does:

src/__init__.py: This file contains all the import statements. We add statements that look like from .example import
src/example.py: This file contains all the function and class definitions.
src/py.typed: An empty file. Read more here
LICENSE: This file contains the license for the code you are publishing.
- You can choose a license from choosealicense.com

pyproject.toml: This file tells build tools (like build and pip) what is required to build your package. For now, this will do:

 [build-system]
 requires = [
     "setuptools>=42",
     "wheel"
 ]
 build-backend = "setuptools.build_meta"

README.md: This file acts as a guide. It gives developers a detailed description of your project.
setup.py (dynamic) or setup.cfg (static) are files that give "setuptools" information about your package (such as the name, version, author etc.). They also inform setuptools which code files to include.
- Writing setup.cfg/setup.py
tests/: This folder is a placeholder for test files.

Packaging Project

First, make sure that the pip installed on the device is of the latest version. To do that, run:

$ pip install --upgrade pip

After this, install build using:

$ pip install --upgrade build

Now, let's package the project. To package the project, run:

$ python -m build

This command will output lots of text and create a build/ folder and a dist/ folder.

The project has been packaged! Now we have to publish it on PyPI.

Publishing on PyPI

The first step to publishing on PyPI is creating an account there. Head over to PyPI and click on register (the text at the top right corner, that I highlighted in green):

You will get a form like this:

Fill out the form and remember the username and password. You will need it while publishing your project.

Now head back to the terminal and run:

$ pip install --upgrade twine

This command will upgrade twine to the latest version if twine is already present. If it isn't present, then it will install the latest version. Now, it is time to upload the package!

$ twine upload dist/*

You will be prompted to enter your PyPI username and password for authentication purposes. After entering them, you will see a URL in the last line of the output. the URL will be of the format https://pypi.org/project/

Congratulations, the python package has been published on PyPI!!!

Now, let's pip install the library!

$ pip install <package_name>

print() in python

Siddhesh Agarwal — Sun, 16 Jan 2022 06:26:10 GMT

In this post, I will talk about the print() statement and its parameters.

So, I was experimenting with the help() function in python and tried the following command:

>>> help(print)
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

me, after seeing file and flush arguments in the output:

I knew about the sep and end parameters but not about the other 2 parameters. But what are these unpopular-parameters-that-online-tutorials-do-not-talk-about?

Today, I am going to share what they are. But before that, let me explain what the first 2 parameters are.

sep is the string that is inserted between 2 values. For example:

>>> print(1, 2, 3, sep='-')
1-2-3
>>> print(1, 2, 3) # The default value of "sep" is a space
1 2 3

end is the string that is appended after the last value. For example:

print(1, 2, 3, end='-')
print(4, 5, 6) # The default value of "end" is a newline
print("Hello")

Will print:

1 2 3-4 5 6
Hello

Now that I have explained what sep and end are, let's talk about the 3rd parameter. The 3rd parameter is file.

It is a file-like object (stream). The default value of file is sys.stdout (sys is a built-in module). If you don't pass this argument, it will default to stdout and the output will be printed to the standard output. This is the terminal where you execute your code. The standard output is full for stdout (Python is implemented in C that's the reason we get to see stdout in Python). If you specify a value for file, the output will be printed to that file. For example:

print(1, 2, 3, sep="\n", file=open("output.txt", "w+"))

the open("output.txt", "w+") will create a file called output.txt (if it doesn't exist already) and write the output to it. So our output.txt file will look like this:

1
2
3

This allows us to write to a file directly without having to convert it to a string. It also allows us to use the sep and end parameters of the print() function without worrying about how to implement them.

Finally, let's talk about the flush parameter. The flush parameter is a boolean value. It "flushes" the internal buffer/stream. Let's see a small example for better understanding:

from time import sleep

# output is not flushed here
print("Hello", end=', ')
sleep(5)
print("world!")

would result in:

Hello, world!

The output looks perfect but there is the problem: the 5-second pause that was supposed to happen between the 2 words! It is not there. We'll run the same code now, but this time we'll clear the output stream:

from time import sleep

# output is flushed this time
print("Hello", end=', ' flush=True)
sleep(5)
print("world!")

Now, when you run the program "Hello, " will be printed first and then after 5 seconds "world!" will be printed.

That's all for now.

id() in python

Siddhesh Agarwal — Sat, 18 Dec 2021 04:28:59 GMT

In this post, I will try to improve your idea about memory in python using the in-built id() function. For those of you who don't know what id() is:

The id() function returns a unique ID of the object. All objects in python have a unique ID and no 2 different values correspond to the same ID.

So let us begin with a small example:

a = b = 500
print(id(a) == id(b))

Fun fact: In python, id(a) == id(b) is analogous to a is b.

The above code prints True because python creates the variable b with the value 500 and then creates a variable a pointing to the value of b. This implies that a and b are pointing towards the same memory location and hence the same ID.

Now let's raise the bar:

a = 500
b = 500
print(id(a) == id(b))

The above code prints False because:

Python creates a variable a pointing to the value 500 in the memory.
Then, it creates another variable b pointing to another value 500 (yeah, both 500 are different).

Hence, both have different IDs because both point toward different memory locations.

I hope this isn't confusing because there is more to come. Guess the output for this:

a = 50
b = 50
print(id(a) == id(b))

Some of you may think "This is the previous question with different values. I know the answer is False" but not so fast.

For small integers (The CPython range is -5 to 256, both inclusive), then integer objects () are shared. This is done entirely to save space. The memory imprint of the console would be significantly larger if these objects weren’t sharing their memory.

So the correct answer is True

Okay, okay. Just one more to go. The last one:

a = 500
id1 = id(a)
a = 500
id2 = id(a)
print(id1 == id2)

Well, even though I am re-declaring the same variable with the same value, the answer is most likely to be False. I'll explain to you why. When you re-declare a variable in python, the interpreter works in the same way as a declaration. i.e. It entirely deletes the previous existing value and creates a variable with the new value. So when we give a = 500, the second time the interpreter deletes the previously existing value of a and creates a new memory location for 500 where a would point towards. Both of these IDs are most likely different.

SO, the answer is False.

NOTE: If the above example had a number belonging to the inclusive range -5 to 256, the answer would have been True. this is because numbers belonging to the inclusive range have a fixed memory location.

Siddhesh's Tech Bytes

Learning CSS with games

CSS

Selectors

Flexbox

Grid

Anchor

Testing yourself

Copyrights and LLMs

The Problem: LLMs Are Built on Copyrighted Works

Why Should AI Companies Open-Source Their Models?

If Not Open-Source, AI Firms Must Pay for Rights

Conclusion: A Fair Approach to AI and Copyright

The Pointlessness of FP vs OOP Discussions

Average FP vs OOP Debate

Paradigms: Tools, Not Identities

The Essence of Programming: Data Flow

Data Flow in Practice

OOP Approach

FP Approach

Data Flow Approach

Moving Beyond Paradigm Wars

Conclusion

TL;DR

RAG Model using Langchain.py and ChromaDB

What is a RAG Model?

What is the use of it?

Building a RAG Model

Step 1: Selecting LLM and VectorStore

Step 2: Data Extraction and Document Loading

Step 3: Creating Embeddings and Storage

Step 4: Connecting Retriever to LLM

Deployment

Deploy ML Models with Streamlit

Creating

Magic

Components

Write

Data Elements

Input Widgets

Layouts

Status Elements

Control Flow

Configuration

Secrets

Deployment

Execution

Publishing Packages using Poetry

Initiating a project

Adding Dependencies

Tweaking poetry.toml

Publishing on PyPI

Encryption and Emojis!

How to use Cryptmoji

Encryption

Decryption

using the Key parameter

Formatter for Jupyter notebooks

Introduction

nbQA

Installation

Usage

black

isort

yapf

autopep8

mdformat

doctest

Using if __name__ == '__main__'

Short Answer

Long Answer

Conclusion

Packaging and Publishing on PyPI

What is PyPI?

File structure

Packaging Project

Publishing on PyPI

print() in python

id() in python

Tweaking `poetry.toml`

using the `Key` parameter

Using if name == 'main'