Lambda GPU Cloud
provides instant access to high-performance cloud GPUs at the best prices on
the market.
Note
On January 1, 2024, the
prices of certain instance types
will be increasing. These increases will apply to both newly launched
instances and instances already running. These price increases will help us
add more GPUs throughout 2024 to address availability concerns from the
community.
Note
Beginning December 13, 2023, new Lambda GPU Cloud instances will launch with
Ubuntu 22.04 instead of Ubuntu 20.04. Currently running instances won’t be
affected by this change.
Significantly, Ubuntu 22.04 includes Python 3.10, while Ubuntu 20.04 includes
Python 3.8.
2 - Can I access my persistent storage without an instance?
You can’t access your persistent storage file systems without attaching them
to an instance at the time the instance is launched.
For this reason, it’s recommended that you keep a local copy of the files you
have saved in your persistent storage file systems.
Note
File systems can’t be attached to running instances.
Moreover, file systems can only be attached to instances in the same region.
For example, a file system created in the us-west-1 (California, USA) region
can only be attached to instances in the us-west-1 region.
File systems can’t be transferred from one region to another. However, you can
copy data between file systems using tools such as rsync.
Lambda GPU Cloud currently doesn’t offer block or object storage.
10 - Can my data be recovered once I've terminated my instance?
Warning
We cannot recover your data once you’ve terminated your instance! Before
terminating an instance, make sure to back up all data that you want to keep.
If you want to save data even after you terminate your instance, create a
persistent filesystem.
Note
The persistent filesystem must be attached to your instance before you start
your instance. The persistent filesystem cannot be attached to your instance
after you start your instance.
When you create a persistent filesystem, a directory with the name of your
persistent filesystem is created in your home directory. For example, if the
name of your persistent filesystem is PERSISTENT-FILESYSTEM, the directory
is created at /home/ubuntu/PERSISTENT-FILESYSTEM. Data not stored in this
directory is erased once you terminate your instance and cannot be
recovered.
11 - Can you provide an estimate of how much a job will cost?
We can’t estimate how much your job will cost or how long it’ll take to
complete on one of our instances. This is because we don’t know the details of
your job, such as how your program works.
However, the performance of our instances is close to what you’d expect from
bare metal machines with the same GPUs.
In order to estimate how much your job will cost or how long it’ll take to
complete, we suggest you create an instance and benchmark your program.
Tip
Check out our GPU benchmarks to form
a general idea of the performance provided by our instances. Keep in mind that
real-world performance doesn’t always match the performance provided by
benchmarks.
We currently don’t support Kubernetes, also known as K8s.
13 - How are on-demand instances invoiced?
Billing for on-demand instances
is in one-minute increments. Billing starts when an instance is launched and
the dashboard shows the instance’s status
is Running.
You’re not billed for the time an instance’s status in the dashboard is
Booting. Similarly, you’re not billed for the time an instance’s status in
the dashboard is Terminating.
Warning
Be sure to terminate any instances that you’re not using!
You will be billed for all minutes that an instance is running, even if the
instance isn’t actively being used.
Invoices are sent weekly for the previous week’s usage.
Note
On-demand instances require us to maintain excess capacity at all times so we
can meet the changing workloads of our customers. For this reason, on-demand
instances are priced higher than reserved instances.
Conversely, we offer
reserved GPU Cloud instances
at a significant savings over on-demand instances, since they allow us to more
accurately determine our capacity needs ahead of time.
14 - How are persistent storage file systems billed?
Persistent storage is billed per GB used per month, in increments of 1 hour.
For example, based on the price of $0.20 per GB used per month:
If you use 1,000 GB of your file system capacity for an entire month (30
days, or 720 hours), you’ll be billed $200.00.
If you use 1,000 GB of your file system capacity for a single day (24
hours), you’ll be billed $6.67.
Note
The actual price of persistent storage will be displayed when you create your
file system.
15 - How do I change my billing address?
To change your billing address: in the
Cloud dashboard, at the bottom of the left
sidebar, click Settings. Click
the Billing tab, then click Edit billing address.
16 - How do I change my password?
To reset your Lambda Cloud password, visit the
Reset Password page.
17 - How do I get started using the dashboard?
The dashboard makes it easy to get
started using Lambda GPU Cloud.
Review the license agreements and terms of service. If you agree to them,
click I agree to the above to launch your instance.
In the dashboard, you should now see your instance listed. Once your instance
has finished booting, you’ll be provided with the details needed to begin
using your instance.
It currently isn’t possible to host a demo on an existing instance.
Note
The new instance hosting your demo can be used like any other Lambda GPU Cloud
on-demand instance. For example, you can SSH into the instance and
open Jupyter Notebook on the
instance.
The Demos feature can be hosted on multi-GPU instance types. However, Demos
uses only one of the GPUs.
Also, demos currently can’t be hosted on H100 instances.
Add a demo to your Lambda GPU Cloud account
In the left sidebar of the
dashboard, click Demos. Then,
click the Add demo button at the top-right of the dashboard.
The Add a demo dialog will appear.
Under Demo Source URL, enter the URL of the Git repository containing
your demo’s source code.
Note
The Demos feature looks in your Git repository for a file named
README.md. If the file doesn’t exist, or if the file doesn’t contain the
required properties, you’ll receive a Demo misconfigured error.
The README.mdmust have at the top a YAML block containing the
following:
Replace GRADIO-VERSION with the version of Gradio your demo is built
with, for example, 3.24.1.
Replace PATH-TO-APP-FILE with the path to your Gradio application file
(the file containing the Gradio
interface code),
relative to the root of your Git repository. For example, if your Gradio
application file is named app.py and is located in the root directory of
your Git repository, replace PATH-TO-APP-FILE with app.py.
Properties other than sdk, sdk_version, and app_file are ignored by
the Demos feature.
Unlisted if you want your demo accessible only by those who know your
demo’s URL.
Under Name, give your demo a name. If you choose to make your demo
public, the name of your demo will appear in the Lambda library of public
models. The name of your demo will also appear in your demo’s URL.
(Optional) Under Description, enter a description for your demo.
The description shows under the name of your demo in your library of demos.
If your demo is public, the description also shows under the name of your
demo in the Lambda library of public models.
Note
You can’t change the name or description of your demo once you add it.
However, you can delete your demo then add it again.
Click Add demo, then follow the prompts to launch a new instance to
host your demo.
Tip
To host a demo that’s already added to your account, in the
Demos dashboard, find the row
containing the demo you want to host, then click Host.
The link to your demo might temporarily appear in the Instances dashboard,
then disappear. This is expected behavior and doesn’t mean your instance or
demo is broken.
The models used by demos are often several gigabytes in size, and can take 5
to 15 minutes to download and load.
Once your instance is launched and your demo is accessible, a link with
your demo’s name will appear under the Demo column. Click the link to
access your demo.
Tip
To see a gallery of all of your demos, at the top-right of the Demos
dashboard, click the See your demos button.
Troubleshooting demos
If you experience trouble accessing your demo, the Demos logs can be helpful
for troubleshooting.
To view the Demos log files, SSH into your instance or open a terminal in
Jupyter Notebook, then run:
sudo bash -c 'for f in /root/virt-sysprep-firstboot.log ~demo/bootstrap.log; do printf "### BEGIN $f\n\n"; cat $f; printf "\n### END $f\n\n"; done > demos_debug_logs.txt; printf "### BEGIN journalctl -u lambda-demos.service\n\n$(journalctl -u lambda-demos.service)\n\n### END journalctl -u lambda-demos.service" >> demos_debug_logs.txt'
This command will produce a file named demos_debug_logs.txt containing the
logs for the Demos feature. You can review the logs from within your instance
by running less demos_debug_logs.txt. Alternatively, you can download the
file locally to review or share.
Note
The Lambda Support team provides only basic
support for the Demos feature. However, assistance might be available in the
community forum.
Here are some examples of how problems present in logs:
Misconfigured README.md file
### BEGIN /home/demo/bootstrap.log
Cloning into '/home/demo/source'...
Traceback (most recent call last):
File "<stdin>", line 17, in <module>
File "<stdin>", line 15, in load
File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 3 validation errors for Metadata
sdk
field required (type=value_error.missing)
sdk_version
field required (type=value_error.missing)
app_file
field required (type=value_error.missing)
Created symlink /etc/systemd/system/multi-user.target.wants/lambda-demos-error-server.service → /etc/systemd/system/lambda-demos-error-server.service.
Bootstrap failed: misconfigured
### END /home/demo/bootstrap.log
Not a Gradio app
### BEGIN /home/demo/bootstrap.log
Cloning into '/home/demo/source'...
Traceback (most recent call last):
File "<stdin>", line 17, in <module>
File "<stdin>", line 15, in load
File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 2 validation errors for Metadata
sdk
unexpected value; permitted: 'gradio' (type=value_error.const; given=docker; permitted=('gradio',))
sdk_version
field required (type=value_error.missing)
Created symlink /etc/systemd/system/multi-user.target.wants/lambda-demos-error-server.service → /etc/systemd/system/lambda-demos-error-server.service.
Bootstrap failed: misconfigured
### END /home/demo/bootstrap.log
19 - How do I get started using the Firewall feature?
The Firewall feature allows you to
configure firewall rules to restrict incoming traffic to your instances.
Note
Firewall rules configured using the Firewall feature apply to all of your
instances outside of the Texas, USA (us-south-1) region.
To use the Firewall feature:
Click Firewall in the left sidebar of the dashboard to open your
firewall settings.
Under General Settings, use the toggle next to Allow ICMP traffic
(ping) to allow or restrict incoming ICMP traffic to your instances.
Note
For network diagnostic tools such as ping and mtr to be able to reach
your instances, you need to allow incoming ICMP traffic.
Next to Inbound Rules, click Edit to configure incoming TCP and UDP
traffic rules.
In the drop-down menu under Type, select:
Custom TCP to manually configure a rule to allow incoming TCP traffic.
Custom UDP to manually configure a rule to allow incoming UDP traffic.
HTTPS to automatically configure a rule to allow incoming HTTPS traffic.
SSH to automatically configure a rule to allow incoming SSH traffic.
All TCP to automatically configure a rule to allow all incoming TCP traffic.
All UDP to automatically configure a rule to allow all incoming UDP traffic.
Warning
If you don’t have a rule to allow incoming traffic to port TCP/22, you
won’t be able to access your instances using SSH.
In the Source field, either:
Click the 🔎 to automatically enter your current IP address.
Enter a single IP address, for example, 203.0.113.1.
Enter an IP address range in CIDR notation, for example,
203.0.113.0/24.
To allow incoming traffic from any source, enter 0.0.0.0/0.
If you choose Custom TCP or Custom UDP, enter a Port range.
Port range can be:
A single port, for example, 8080.
A range of ports, for example, 8080-8081.
(Optional) Enter a Description for the rule.
(Optional) Click Add rule to add additional rules.
(Optional) Click the x next
to any rule you want to delete.
Click Update to apply your changes.
Note
The maximum number of firewall rules you can have is 20.
If you have more than 20 rules, new instances you create might not launch.
Also, it’s possible that not all of your rules will be active, which might
leave your instances unsecure.
20 - How do I get started using the Team feature?
Create a team
In the dashboard, click Team at the bottom-left of the dashboard. Then,
click Invite at the top-right of the Team dashboard.
Enter the email address of the person you want to invite to your team.
Select their role in the team, either an Admin or a Member. Then,
click Send invitation.
Warning
Be sure to invite only trusted persons to your team!
Currently, the only differences between the Admin and Member roles are
that an Admin can:
Invite others to the team.
Remove others from the team.
Modify payment information.
Change the team name.
This means that a person with a Member role can, for example:
Launch instances that will incur charges.
Terminate instances that should continue to run.
Note
You can’t send an invitation to an email address already associated with a
Lambda Cloud account. If you try to, you’ll be presented with a message
that says there is already a Lambda Cloud account associated with the email
address you’re trying to send an invitation to.
The person you’re inviting to your team must first close their existing
Lambda Cloud account before they can be invited to your team.
The person you invited to your team will receive an email letting them know
that they’ve been invited to a team on Lambda Cloud.
In that email, they should click Join the Team.
Note
Until the person you invited to your team accepts their invitation, they
will be listed in the Team dashboard as Invitation pending.
You can delete the invitation while it’s pending by clicking ⋮ where
the person is listed in your Team dashboard, then choosing Delete
invitation.
Note
If the person you invited to your team doesn’t receive their invitation,
you have to delete their invitation then invite them again.
In the Team dashboard of the person you invited to your team, the person will
see that they are on your team. In your Team dashboard, you’ll see the person
you invited listed.
Change a teammate’s role
To change the role of a person on your team from Member to Admin, click
⋮ where the person is listed in your Team dashboard, then choose Change
to Admin.
Conversely, to change the role of a person on your team from Admin to
Member, click ⋮ where the person is listed in your Team dashboard, then
choose Change to Member.
Close a teammate’s account
To close a teammate’s account, click the ⋮ where your teammate is listed
in your Team dashboard. Then, choose Deactivate user.
Warning
Carefully review the information in the dialog box that pops up.
Change team name
To change the name of your team, click Settings at the bottom-left of the
dashboard, then click Edit team name. Enter a new name for your team, then
click Update team name.
21 - How do I import an SSH key from a GitHub account?
To import an SSH key from a GitHub account and add it to your instance:
Using your existing SSH key, SSH into your instance.
To learn your instance’s private IP address, SSH into your instance and run:
ip -4 -br addr show | grep '10.'
The above command will output, for example:
enp5s0 UP 10.19.60.24/20
In the above example, the instance’s private IP address is 10.19.60.24.
Tip
If you want your instance’s private IP address and only that address,
run the following command instead:
ip -4 -br addr show | grep -Eo '10\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
The above command will output, for example:
10.19.60.24
Learn what ports on your instance are publicly accessible
You can use Nmap to learn what ports on your instance are publicly
accessible, that is, reachable over the Internet.
Note
The instructions, below, assume you’re running Ubuntu on your computer.
First, install Nmap on your computer (not on your instance) by running:
sudo apt install -y nmap
Next, run:
nmap -Pn INSTANCE-IP-ADDRESS
Replace INSTANCE-IP-ADDRESS with your instance’s IP address, which you can
get from the Cloud dashboard.
The command will output, for example:
Starting Nmap 7.80 ( https://nmap.org ) at 2023-01-11 13:22 PST
Nmap scan report for 129.159.46.35
Host is up (0.041s latency).
Not shown: 999 filtered ports
PORT STATE SERVICE
22/tcp open ssh
Nmap done: 1 IP address (1 host up) scanned in 6.42 seconds
In the above example, TCP port 22 (SSH) is publicly accessible.
Note
If nmap doesn’t show TCP/22 (SSH) or any other ports open, your:
For the highest performance when training, we recommend copying your dataset,
containers, and virtual environments from persistent storage to your home
directory. This can take some time but greatly increases the speed of
training.
29 - How long does it take for instances to launch?
Single-GPU instances usually take 3-5 minutes to launch.
Multi-GPU instances usually take 10-15 minutes to launch.
Note
Jupyter Notebook and
Demos can take a few minutes after an
instance launches to become accessible.
Note
Billing starts the moment an instance begins booting.
30 - Is it possible to open ports other than for SSH?
By default, all ports are open to TCP and UDP traffic. ICMP traffic is also
allowed by default.
It’s possible to allow more than one SSH key to access your instance. To do
so, you need to add public keys to ~/.ssh/authorized_keys. You can do
this with the echo command.
Your account will be permanently banned from Lambda GPU Cloud. Your account
will be referred for collection. Legal action may be taken against you.
34 - What is the capacity of persistent storage file systems?
Each persistent storage file system has a capacity of 8 exabytes, or 8,000,000
terabytes, except for file systems created in the Texas, USA (us-south-1)
region. The capacity of file systems in the Texas, USA (us-south-1) region is
10 terabytes.
You can have a total of 24 file systems.
35 - What network bandwidth does Lambda GPU Cloud provide?
Note
Some sites limit transfer speeds. This is known as bandwidth throttling.
Lambda GPU Cloud doesn’t limit your transfer speeds but can’t control
other sites’ use of bandwidth throttling.
Further, real-world network bandwidth depends on a variety of factors,
including the total number of connections opened by your applications and
overall network utilization.
Utah, USA region (us-west-3)
The bandwidth between instances in our Utah, USA region (us-west-3) can be up
to 200 Gbps.
The total bandwidth from this region to the Internet can be up to 20 Gbps.
Texas, USA region (us-south-1)
The bandwidth between instances in our Texas, USA region (us-south-1) can be
up to 200 Gbps.
The total bandwidth from this region to the Internet can be up to 20 Gbps.
Note
We’re in the process of testing the network bandwidth in our other regions.
36 - What should I do about timeout waiting for RPC from GSP errors?
If you’re seeing in your instance’s logs error messages about Timeout waiting for RPC from GSP!, the system software installed on your instance needs to be
upgraded.
Note
nvidia-smi might also produce output similar to the following:
38 - Why am I seeing an error about NMI received for unknown reason?
You can safely disregard the error message: “Uhhuh. NMI received for unknown
reason […] .”
This error message might show up in, for example:
The log file /var/log/syslog.
The output of the command dmesg.
The output of the command journalctl.
The error message results from a bug in AMD’s newer processors, including
processors used in our servers. The bug has no impact other than causing the
“NMI received for unknown reason” error message to appear in system logs.
Tip
To learn more about the “NMI received for unknown reason” error message, see:
39 - Why are some instance types grayed out when I try to launch an instance?
If you try to launch an instance from the dashboard and see that the instance
type you want is grayed out, then we’re currently at capacity for that
instance type.
40 - Why can't my program find the NVIDIA cuDNN library?
Unfortunately, the
NVIDIA cuDNN license
limits how cuDNN can be used on our instances.
On our instances, cuDNN can only be used by the PyTorch® framework and
TensorFlow library installed as part of
Lambda Stack.
Other software, including PyTorch and TensorFlow installed outside of Lambda
Stack, won’t be able to find and use the cuDNN library installed on our
instances.
Tip
Software outside of Lambda Stack usually looks for the cuDNN library files in
/usr/lib/x86_64-linux-gnu. However, on our instances, the cuDNN library
files are in /usr/lib/python3/dist-packages/tensorflow.
Creating symbolic links, or “symlinks,” for the cuDNN library files might
allow your program to find the cuDNN library on our instances.
Run the following command to create symlinks for the cuDNN library files:
for cudnn_so in /usr/lib/python3/dist-packages/tensorflow/libcudnn*;do sudo ln -s "$cudnn_so" /usr/lib/x86_64-linux-gnu/
done
41 - Why is my card being declined?
Common reasons why card transactions are declined include:
The card is a debit card or a prepaid card
We don’t accept debit cards or prepaid cards. We only accept major credit
cards.
The purchase is being made from a country we don’t support
We currently only support customers in the following regions:
United States
Canada
Chile
Iceland
United Arab Emirates
Saudi Arabia
South Africa
Israel
Taiwan
South Korea
Japan
Singapore
Australia
New Zealand
United Kingdom
Switzerland
European Union (except for Romania)
The purchase is being made while you’re connected to a VPN
Purchases made while using a VPN are flagged as suspicious.
The card issuer is denying our pre-authorization charge
We make a $10 pre-authorization charge to a card before accepting it for
payment, similar to how gas stations and hotels do. If the card issuer denies
the pre-authorization charge, then we can’t accept the card for payment.